Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier
In the last few years, there has been an increasing interest in developing systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

Review on research progress of machine lip reading

G Pu, H Wang - The Visual Computer, 2023 - Springer
Abstract Machine lip reading recognizes text content through the speaker's lip motion
information. Lip reading has significant research and application value. With the continuous …

Improved speaker independent lip reading using speaker adaptive training and deep neural networks

I Almajai, S Cox, R Harvey, Y Lan - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-
reading of continuous speech using a medium size vocabulary (around 1000 words) is …

Cromm-vsr: Cross-modal memory augmented visual speech recognition

M Kim, J Hong, SJ Park, YM Ro - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Visual Speech Recognition (VSR) is a task that recognizes speech from external
appearances of the face (, lips) into text. Since the information from the visual lip movements …

Towards estimating the upper bound of visual-speech recognition: The visual lip-reading feasibility database

A Fernandez-Lopez, O Martinez… - 2017 12th IEEE …, 2017 - ieeexplore.ieee.org
Speech is the most used communication method between humans and it involves the
perception of auditory and visual channels. Automatic speech recognition focuses on …

Generating intelligible audio speech from visual speech

T Le Cornu, B Milner - IEEE/ACM Transactions on Audio …, 2017 - ieeexplore.ieee.org
This paper is concerned with generating intelligible audio speech from a video of a person
talking. Regression and classification methods are proposed first to estimate static spectral …

Decoding visemes: Improving machine lip-reading

HL Bear, R Harvey - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current
work often uses viseme classification supported by language models with varying degrees …

A survey on mouth modeling and analysis for sign language recognition

E Antonakos, A Roussos… - 2015 11th IEEE …, 2015 - ieeexplore.ieee.org
Around 70 million Deaf worldwide use Sign Languages (SLs) as their native languages. At
the same time, they have limited reading/writing skills in the spoken language. This puts …

An effective conversion of visemes to words for high-performance automatic lipreading

S Fenghour, D Chen, K Guo, B Li, P Xiao - Sensors, 2021 - mdpi.com
As an alternative approach, viseme-based lipreading systems have demonstrated promising
performance results in decoding videos of people uttering entire sentences. However, the …

Large-scale unsupervised audio pre-training for video-to-speech synthesis

T Kefalas, Y Panagakis, M Pantic - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video
of a speaker. Previous approaches train on data from almost exclusively audio-visual …