作者
N. Radha, A. Shahina, A. Nayeemulla Khan
发表日期
2016/12
期刊
Indian Journal of Science and Technology
卷号
9
期号
44
页码范围
1-7
出版商
indjst
简介
Objectives: This paper proposes a method to improve the performance of a Visual Speech Recognition (VSR) system by combining the pixel-based and geometry-based features, so as to augment the performance of audio based Automatic Speech Recognition (ASR) systems in adverse conditions. Methods/Statistical Analysis: A video database comprising of 11000 utterances of isolated words, collected from 20 speakers, is used in this study. Pixel based features (DCT and DWT) and geometric features (Active Shape Model or ASM) are fused at two levels, one at the feature level and the other at the decision level. A simple Gaussian mixture HMM word model is built for feature level fusion, while a two stream HMM model is built for decision level fusion. Findings: The VSR system built using the combined features shows a significant improvement in performance when compared to individual VSR systems built using pixel and geometric based features. The accuracy of the individual system is 76% for geometric features, 64% for DCT and 72% for DWT pixel-based features. The performance improves for combined features with an accuracy of 80% for ASM+ DCT and 84.7% for DWT+ ASM. A weighted decision level fusion result in further improvement, with an accuracy of 84% for ASM+ DCT and 92% for ASM+ DWT. Application/Improvements: The combined VSR could be preferred over individual pixel/geometric feature based systems to augment the performance of audio based Automatic Speech Recognition (ASR) systems in adverse conditions. Further studies on improving the VSR system, which could be used in lieu of audio-based ASR …
引用总数
2018201920202021202212212