查看文章

researchgate.net 中的 [PDF]

Combining audio and visual speech recognition using LSTM and deep convolutional neural network

作者

R Shashidhar, S Patilkulkarni, SB Puneeth

发表日期

2022/12

期刊

International Journal of Information Technology

卷号

期号

页码范围

3425-3436

出版商

Springer Nature Singapore

简介

Human speech is bimodal, whereas audio speech relates to the speaker's acoustic waveform. Lip motions are referred to as visual speech. Audiovisual Speech Recognition is one of the emerging fields of research, particularly when audio is corrupted by noise. In the proposed AVSR system, a custom dataset was designed for English Language. Mel Frequency Cepstral Coefficients technique was used for audio processing and the Long Short-Term Memory (LSTM) method for visual speech recognition. Finally, integrate the audio and visual into a single platform using a deep neural network. From the result, it was evident that the accuracy was 90% for audio speech recognition, 71% for visual speech recognition, and 91% for audiovisual speech recognition, the result was better than the existing approaches. Ultimately model was skilled at enchanting many suitable decisions while forecasting the spoken word for …

引用总数

被引用次数：44

2022202320249 17 18

学术搜索中的文章

Combining audio and visual speech recognition using LSTM and deep convolutional neural network

R Shashidhar, S Patilkulkarni, SB Puneeth - International Journal of Information Technology, 2022

被引用次数：44 相关文章所有 2 个版本