作者
R Shashidhar, S Patilkulkarni, SB Puneeth
发表日期
2022/12
期刊
International Journal of Information Technology
卷号
14
期号
7
页码范围
3425-3436
出版商
Springer Nature Singapore
简介
Human speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for …
引用总数
学术搜索中的文章
R Shashidhar, S Patilkulkarni, SB Puneeth - International Journal of Information Technology, 2022