SEMG Approach for Speech Recogition

Siddesh Bhimrao Shisode; Bhavesh Mhatre, Jeet Sikligar, Sushil Vishwakarma; Supriya Tupe, Sheetal Jagtap, Milind Nemade

doi:10.26483/ijarcs.v14i3.6970

PDF

Published: Jun 19, 2023

DOI: https://doi.org/10.26483/ijarcs.v14i3.6970

Keywords:

CNN, GMM, Speech Recognition, Image Processing

Siddesh Bhimrao Shisode

K.J Somaiya Institute of Technology

https://orcid.org/0000-0002-5203-147X

Bhavesh Mhatre, Jeet Sikligar, Sushil Vishwakarma

Supriya Tupe, Sheetal Jagtap, Milind Nemade

Abstract

Speech is the most familiar and habitual way of communication used by most of us. Due to speech disabilities, many people find it difficult to properly voice their views and thus are at a disadvantage. The research tackles the issue of lack of speech from a speech impaired user by recognizing it with the use of ML models such as Gaussian Mixture Model - GMM and Convolutional Neural Network - CNN. With properly recorded and cleaned muscle activity from the facial muscles it is possible to predict the words being uttered/whispered with a certain accuracy. The intended system will additionally also have a visual aid system which can provide better accuracy when used together with the facial muscle activity-based system. Neuromuscular signals from the speech articulating muscles are recorded using Surface ElectroMyoGraphy (SEMG) sensors, which will be used to train the machine learning models. In this paper we have demonstrated various signals synthesized through the ElectroMyography system and how they can be classified using machine learning models such as Gaussian Mixture Model and Convolutional Neural Network for the visual-based lip-reading system.

Downloads

Download data is not yet available.

Issue

Vol. 14 No. 3 (2023): May-June 2023

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

@article{b1,

author = {M. Janke and L. Diener},

journal = {in IEEE/ACM Transactions on Audio, Speech, and Language Processing},

month = {December},

number = {12},

pages = {2375-2385},

title = {EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals},

volume = {25},

year = {2017},

doi = {10.1109/TASLP.2017.2738568},

}

@article{b1,

author = {M. Janke and L. Diener},

journal = {in IEEE/ACM Transactions on Audio, Speech, and Language Processing},

month = {December},

number = {12},

pages = {2375-2385},

title = {EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals},

volume = {25},

year = {2017},

doi = {10.1109/TASLP.2017.2738568},

}

@inproceedings{b2,

author = {Janke, Matthias and Wand, Michael and Nakamura, Keigo and Schultz, Tanja},

booktitle = {2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},

pages = {365-368},

publisher = {IEEE},

title = {Further investigations on EMG-to-speech conversion},

year = {2012},

}

@article{b3,

author = {G. S. Meltzner and J. T. Heaton and Y. Deng and G. De Luca and S. H. Roy and J. C. Kline},

journal = {in IEEE/ACM Transactions on Audio, Speech, and Language Processing},

month = {December},

number = {12},

pages = {2386-2398},

title = {Silent Speech Recognition as an Alternative Communication Device for Persons with Laryngectomy},

volume = {25},

year = {2017},

doi = {10.1109/TASLP.2017.2740000},

}

@inproceedings{b4,

author = {Jou, Stan and Schultz, Tanja and Walliczek, Matthias and Kraft, Florian and Waibel, Alex},

booktitle = {Ninth International Conference on Spoken Language Processing},

title = {Towards continuous speech recognition using surface electromyography},

year = {2006},

}

@inproceedings{b5,

author = {Gondaliya, Yash and Srinivasan, Vishaka and Malvia, Neha and Harbada, Manav and Jagtap, Sheetal},

booktitle = {Proceedings of the 4th International Conference on Advances in Science and Technology (ICAST2021},

title = {Voiceless Speech Recognition System},

year = {2021},

}

@inproceedings{b6,

author = {W. C. Yau and S. P. Arjunan and D. K. Kumar},

booktitle = {TENCON 2008-2008 IEEE Region 10 Conference},

pages = {1-6},

title = {Classification of voiceless speech using facial muscle activity and vision-based techniques},

year = {2008},

}

@article{b7,

author = {Umesh Agnihotri and Ajat Shatru Arora and Atik Garg},

journal = {INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) ACMEE â€“},

title = {Vowel Recognition using Facial Movement (SEMG) for Speech Control based HCI},

volume = {4},

year = {2016},

doi = {10.17577/IJERTCONV4IS15044},

}

@inproceedings{b8,

author = {Morse, MS and Gopalan, YN and Wright, M},

booktitle = {Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society Volume 13: 1991},

pages = {1877-1878},

title = {Speech Recognition Using Myoelectric Signals With Neural Networks},

year = {1991},

}

@inproceedings{b8,

author = {Morse, MS and Gopalan, YN and Wright, M},

booktitle = {Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society Volume 13: 1991},

pages = {1877-1878},

title = {Speech Recognition Using Myoelectric Signals With Neural Networks},

year = {1991},

}

@article{b9,

author = {Morse, Michael S and O'Brien, Edward M},

journal = {Computers in biology and medicine},

number = {6},

pages = {399-410},

title = {Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscle using surface electrodes},

volume = {16},

year = {1986},

}

@article{b10,

author = {Alan J.. Fridlund and John T.. Cacioppo},

journal = {Psychophysiology},

number = {5},

pages = {567-589},

title = {Guidelines for Human Electromyographic Research},

volume = {23},

year = {1986},

}

@inproceedings{b11,

author = {C. Jorgensen and D. D. Lee and S. Agabont},

booktitle = {Proceedings of the International Joint Conference on Neural Networks},

pages = {3128-3133},

publisher = {2003},

title = {Sub auditory speech recognition based on EMG signals},

year = {2003},

}

@inproceedings{b12,

author = {L. Lu and X. Zhang and X. Xu and Z. Wu},

booktitle = {2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)},

pages = {529-532},

title = {Homeomorphic manifold analysis: Learning motion features of image sequence for lipreading},

year = {2015},

}

@article{b13,

author = {D. Michelsanti al.},

journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},

pages = {1368-1396},

title = {An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation},

volume = {29},

year = {2021},

doi = {10.1109/TASLP.2021.3066303},

}

@inproceedings{b14,

author = {Meltzner, Geoffrey S and Sroka, Jason and Heaton, James T and Gilmore, L Donald and Colby, Glen and Roy, Serge and Chen, Nancy and Luca, Carlo J De},

booktitle = {Ninth Annual Conference of the International Speech Communication Association},

publisher = { 2667â€“2670},

title = {Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face},

year = {2008},

}

@article{b15,

author = {D. Margam and R. Aralikatti and T. Sharma and A. Thanda},

journal = {arXiv preprint arXiv:1906.12170}},

month = {June},

title = {LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models},

year = {2019},

doi = {10.48550/ARXIV.1906.12170},

}

@article{b15,

author = {D. Margam and R. Aralikatti and T. Sharma and A. Thanda},

journal = {arXiv preprint arXiv:1906.12170},

month = {June},

title = {LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models},

year = {2019},

doi = {10.48550/ARXIV.1906.12170},

}

@book{b16,

author = {Liew, Alan Wee-Chung and Wang, Shilin},

publisher = {IGI Global},

title = {Visual Speech Recognition: Lip Segmentation and Mapping},

year = {2009},

doi = {10.4018/978-1-60566-186-5},

}

@inproceedings{b17,

author = {Shashidhar, R and Patilkulkarni, S and Puneeth, SB},

booktitle = {2020 IEEE International Conference for Innovation in Technology (INOCON)},

pages = {1-5},

title = {Audio Visual Speech Recognition using Feed Forward Neural Network Architecture},

year = {2020},

}

@article{b18,

author = {Fenghour, Souheil and Chen, Daqing and Guo, Kun and Xiao, Perry},

journal = {IEEE Access},

pages = {15516-215530},

title = {Lip Reading Sentences Using Deep Learning With Only Visual Cues},

volume = {8},

year = {2020},

doi = {10.1109/ACCESS.2020.3040906},

}

@article{b19,

author = {Mendoza, Luis Enrique and PenËœa, Jesus and Valencia, Jairo Lenin Ram},

journal = {Journal of Technology},

number = {2},

pages = {35-41},

title = {Electromyographic patterns of sub-vocal Speech: Records and classification},

volume = {12},

year = {2013},

doi = {10.18270/rt.v12i2.758},

}

@inproceedings{b20,

author = {Maier-Hein, Lena and Metze, Florian and Schultz, Tanja and Waibel, Alex},

booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding},

pages = {331-336},

publisher = { },

title = {Session independent non-audible speech recognition using surface electromyography},

year = {2005},

}

@inproceedings{b21,

address = { Florida, 2003},

author = {Manabe, Hiroyuki and Hiraiwa, Akira and Sugimura, Toshiaki},

booktitle = {CHI'03 extended abstracts on Human factors in computing systems},

pages = {794-795},

publisher = {Ft. Lauderdale},

title = {Unvoiced speech recognition using EMG-mime speech recognition},

year = {2003},

}

@inproceedings{b22,

author = {Kain, Alexander and Macon, Michael W},

booktitle = {Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181)},

pages = {285-288},

publisher = {IEEE},

title = {Spectral voice conversion for text-to-speech synthesis},

year = {1998},

}

@inproceedings{b23,

author = {Graves, Alex and Mohamed, Abdel-rahman and Hinton, Geoffrey},

booktitle = {Mohamed},

pages = {6645-6649},

title = {Speech recognition with deep recurrent neural networks},

year = {2013},

}

@inproceedings{b24,

author = {Kumar, Sanjay and Kumar, Dinesh K and Alemu, Melaku and Burry, Mark},

booktitle = {Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004.},

pages = {593-597},

title = {EMG based voice recognition},

year = {2004},

}

@inproceedings{b25,

author = {Arjunan, Sridhar P and Kumar, Dinesh K and Yau, Wai C and Weghorn, Hans},

booktitle = {2006 International Conference of the IEEE Engineering in Medicine and Biology Society},

pages = {2191-2194},

title = {Unspoken Vowel Recognition Using Facial Electromyogram},

year = {2006},

}

@article{b26,

author = {Toda, Tomoki and Black, Alan W and Tokuda, Keiichi},

journal = {Speech communication},

number = {3},

pages = {215-227},

title = {Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model},

volume = {50},

year = {2008},

}

@inproceedings{b27,

author = {Toda, Tomoki and Black, Alan W and Tokuda, Keiichi},

booktitle = {Fifth ISCA Workshop on Speech Synthesis},

title = {Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis},

year = {2004},

}

@article{b28,

author = {Toda, Tomoki and Shikano, Kiyohiro},

title = {NAM-to-speech conversion with Gaussian mixture models},

year = {2005},

}

@article{b29,

author = {Chan, Adrian DC and Englehart, Kevin and Hudgins, Bernard and Lovely, Dennis F},

journal = {Medical and Biological Engineering and Computing},

pages = {500-504},

title = {Myo-electric signals to augment speech recognition},

volume = {39},

year = {2001},

doi = {10.1007/BF02345373},

}

@inproceedings{b30,

author = {Yau, Wai Chee and Kumar, Dinesh Kant and Arjunan, Sridhar Poosapadi},

booktitle = {Proceedings of the HCSNet workshop on Use of vision in human-computer interaction-Volume 56},

pages = {93-101},

title = {Voiceless speech recognition using dynamic visual speech features},

year = {2006},

}

@book{b31,

author = {Petajan, Eric David},

publisher = {University of Illinois at Urbana-Champaign},

title = {Automatic lipreading to enhance speech recognition (speech reading)},

year = {1984},

}

@inproceedings{b32,

author = {Gordan, Mihaela and Kotropoulos, Constantine and Pitas, Ioannis},

booktitle = {Proceedings. International Conference on Image Processing},

pages = {III--III},

title = {Application of support vector machines classifiers to visual speech recognition},

year = {2002},

}

@inproceedings{b32,

author = {Gordan, Mihaela and Kotropoulos, Constantine and Pitas, Ioannis},

booktitle = {Proceedings. International Conference on Image Processing},

pages = {III--III},

title = {Application of support vector machines classifiers to visual speech recognition},

year = {2002},

}

@inproceedings{b32,

author = {Gordan, Mihaela and Kotropoulos, Constantine and Pitas, Ioannis},

booktitle = {Proceedings. International Conference on Image Processing},

pages = {III--III},

title = {Application of support vector machines classifiers to visual speech recognition},

year = {2002},

}

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References