作者
N Radha, A Shahina, A Nayeemulla Khan
发表日期
2018
研讨会论文
International Conference on Innovative Computing and Communication
出版商
Springer
简介
Building an ASR system in adverse conditions is a challenging task. The performance of the ASR system is high in clean environments. However, the variabilities such as speaker effect, transmission effect, and the environmental conditions degrade the recognition performance of the system. One way to enhance the robustness of ASR system is to use multiple sources of information about speech. In this work, two sources of additional information on speech are used to build a multimodal ASR system. A throat microphone speech and visual lip reading which is less susceptible to noise acts as alternate sources of information. Mel-frequency cepstral features are extracted from the throat signal and modeled by HMM. Pixel-based transformation methods (DCT and DWT) are used to extract the features from the viseme of the video data and modeled by HMM. Throat and visual features are combined at the feature …
引用总数
学术搜索中的文章
N Radha, A Shahina, A Nayeemulla Khan - International Conference on Innovative Computing and …, 2019