Audio visual speech recognition using feed forward neural network architecture- 学术资源搜索

Audio visual speech recognition using feed forward neural network architecture

R Shashidhar, S Patilkulkarni… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

R Shashidhar, S Patilkulkarni, SB Puneeth

2020 IEEE International Conference for Innovation in Technology …, 2020•ieeexplore.ieee.org

In recent times, human lip-readers are being presented as valuable in the assemble of scientific proof. But, like all human beings, they grieve from unpredictability in analyzing the lip movement. Here an intelligent system is designed in such a way that it predicts the output for the lip reading. Proposed audio visual speech recognition (AVSR) system uses local proprietary dataset to detect the English word spoken by the speaker in the video, by using feed forward neural networks (FFNN) and Long-Short-Term-Memory (LSTM) network. The audio features selected are Mel Frequency Cepstral Coefficients (MFCC), MEL, CONTRAST, TONNETZ and CHROMA. In case of visual feature based model development, difference of location of various points around the lip of current frame and previous frame has been considered. These features are extracted for each video in the dataset. Using the extracted audio features a Deep Neural Nework having feed forward architecture is trained and using the extracted visual features a LSTM recurrent neural network is developed. In the audio and visual feature based model, accuracy is 91.42% and 80% respectively. Finally, audio and video models are integrated using feed forward neural network. Final model is capable of taking more appropriate decision while predicting the spoken word. The accuracy of integrated model is 92.38%.

ieeexplore.ieee.org

展开收起

被引用次数：6 相关文章

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Audio visual speech recognition using feed forward neural network architecture

引用