查看文章

Improving Recognition of Speech System using Multi-modal Approach

作者

N Radha, A Shahina, A Nayeemulla Khan

发表日期

2018

研讨会论文

International Conference on Innovative Computing and Communication

出版商

Springer

简介

Building an ASR system in adverse conditions is a challenging task. The performance of the ASR system is high in clean environments. However, the variabilities such as speaker effect, transmission effect, and the environmental conditions degrade the recognition performance of the system. One way to enhance the robustness of ASR system is to use multiple sources of information about speech. In this work, two sources of additional information on speech are used to build a multimodal ASR system. A throat microphone speech and visual lip reading which is less susceptible to noise acts as alternate sources of information. Mel-frequency cepstral features are extracted from the throat signal and modeled by HMM. Pixel-based transformation methods (DCT and DWT) are used to extract the features from the viseme of the video data and modeled by HMM. Throat and visual features are combined at the feature …

引用总数

被引用次数：5

2020202120221 2 2

学术搜索中的文章

Improving recognition of speech system using multimodal approach

N Radha, A Shahina, A Nayeemulla Khan - International Conference on Innovative Computing and …, 2019

被引用次数：5 相关文章所有 3 个版本