Recent advances in the automatic recognition of audiovisual speech

G Potamianos, C Neti, G Gravier, A Garg… - Proceedings of the …, 2003 - ieeexplore.ieee.org
Visual speech information from the speaker's mouth region has been successfully shown to
improve noise robustness of automatic speech recognizers, thus promising to extend their …

A review of recent advances in visual speech decoding

Z Zhou, G Zhao, X Hong, M Pietikäinen - Image and vision computing, 2014 - Elsevier
Visual speech information plays an important role in automatic speech recognition (ASR)
especially when audio is corrupted or even inaccessible. Despite the success of audio …

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arXiv preprint arXiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild

S Yang, Y Zhang, D Feng, M Yang… - 2019 14th IEEE …, 2019 - ieeexplore.ieee.org
Large-scale datasets have successively proven their fundamental importance in several
research fields, especially for early progress in some emerging topics. In this paper, we …

Audio-visual speech modeling for continuous speech recognition

S Dupont, J Luettin - IEEE transactions on multimedia, 2000 - ieeexplore.ieee.org
This paper describes a speech recognition system that uses both acoustic and visual
speech information to improve recognition performance in noisy environments. The system …

Lipreading with local spatiotemporal descriptors

G Zhao, M Barnard… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org
Visual speech information plays an important role in lipreading under noisy conditions or for
listeners with a hearing impairment. In this paper, we present local spatiotemporal …

[PDF][PDF] Audio-visual automatic speech recognition: An overview

G Potamianos, C Neti, J Luettin… - Issues in visual and audio …, 2004 - academia.edu
We have made significant progress in automatic speech recognition (ASR) for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

CUAVE: A new audio-visual database for multimodal human-computer interface research

EK Patterson, S Gurbuz, Z Tufekci… - 2002 IEEE International …, 2002 - ieeexplore.ieee.org
Multimodal signal processing has become an important topic of research for overcoming
certain problems of audio-only speech processing. Audio-visual speech recognition is one …

LCANet: End-to-end lipreading with cascaded attention-CTC

K Xu, D Li, N Cassimatis, X Wang - 2018 13th IEEE …, 2018 - ieeexplore.ieee.org
Machine lipreading is a special type of automatic speech recognition (ASR) which
transcribes human speech by visually interpreting the movement of related face regions …

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …