Recent advances in the automatic recognition of audiovisual speech

G Potamianos, C Neti, G Gravier, A Garg… - Proceedings of the …, 2003 - ieeexplore.ieee.org
Visual speech information from the speaker's mouth region has been successfully shown to
improve noise robustness of automatic speech recognizers, thus promising to extend their …

[PDF][PDF] Audio-visual automatic speech recognition: An overview

G Potamianos, C Neti, J Luettin… - Issues in visual and audio …, 2004 - academia.edu
We have made significant progress in automatic speech recognition (ASR) for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

Diffusion maps

RR Coifman, S Lafon - Applied and computational harmonic analysis, 2006 - Elsevier
In this paper, we provide a framework based upon diffusion processes for finding meaningful
geometric descriptions of data sets. We show that eigenfunctions of Markov matrices can be …

Audio-visual speech modeling for continuous speech recognition

S Dupont, J Luettin - IEEE transactions on multimedia, 2000 - ieeexplore.ieee.org
This paper describes a speech recognition system that uses both acoustic and visual
speech information to improve recognition performance in noisy environments. The system …

Extraction of visual features for lipreading

I Matthews, TF Cootes, JA Bangham… - … on Pattern Analysis …, 2002 - ieeexplore.ieee.org
The multimodal nature of speech is often ignored in human-computer interaction, but lip
deformations and other body motion, such as those of the head, convey additional …

Data fusion and multicue data matching by diffusion maps

S Lafon, Y Keller, RR Coifman - IEEE Transactions on pattern …, 2006 - ieeexplore.ieee.org
Data fusion and multicue data matching are fundamental tasks of high-dimensional data
analysis. In this paper, we apply the recently introduced diffusion framework to address …

[PDF][PDF] Audio visual speech recognition

C Neti, G Potamianos, J Luettin, I Matthews, H Glotin… - 2000 - infoscience.epfl.ch
We have made significant progress in automatic speech recognition ASR for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

A review of speech-based bimodal recognition

CC Chibelushi, F Deravi… - IEEE transactions on …, 2002 - ieeexplore.ieee.org
Speech recognition and speaker recognition by machine are crucial ingredients for many
important applications such as natural and flexible human-machine interfaces. Most …

Lipformer: learning to lipread unseen speakers based on visual-landmark transformers

F Xue, Y Li, D Liu, Y Xie, L Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Lipreading refers to understanding and further translating the speech of a video speaker into
textual outputs. State-of-the-art lipreading methods excel in interpreting overlap speakers, ie …

Pushing the boundaries of audiovisual word recognition using residual networks and LSTMs

T Stafylakis, MH Khan, G Tzimiropoulos - Computer Vision and Image …, 2018 - Elsevier
Visual and audiovisual speech recognition are witnessing a renaissance which is largely
due to the advent of deep learning methods. In this paper, we present a deep learning …