Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier
In the last few years, there has been an increasing interest in developing systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com
Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …

End-to-end audiovisual speech recognition

S Petridis, T Stafylakis, P Ma, F Cai… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Several end-to-end deep learning approaches have been recently presented which extract
either audio or visual features from the input images or audio signals and perform speech …

Multimodal sparse transformer network for audio-visual speech recognition

Q Song, B Sun, S Li - IEEE Transactions on Neural Networks …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) is the major human–machine interface in many
intelligent systems, such as intelligent homes, autonomous driving, and servant robots …

Towards practical lipreading with distilled and efficient models

P Ma, B Martinez, S Petridis… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent
works have placed emphasis on aspects such as improving performance by finding the …

Audio-visual speech recognition with a hybrid ctc/attention architecture

S Petridis, T Stafylakis, P Ma… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Recent works in speech recognition rely either on connectionist temporal classification
(CTC) or sequence-to-sequence models for character-level recognition. CTC assumes …

Lip-reading with densely connected temporal convolutional networks

P Ma, Y Wang, J Shen, S Petridis… - Proceedings of the …, 2021 - openaccess.thecvf.com
In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN)
for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have …

Deep learning-based automated lip-reading: A survey

S Fenghour, D Chen, K Guo, B Li, P Xiao - IEEE Access, 2021 - ieeexplore.ieee.org
A survey on automated lip-reading approaches is presented in this paper with the main
focus being on deep learning related methodologies which have proven to be more fruitful …

Lip reading sentences using deep learning with only visual cues

S Fenghour, D Chen, K Guo, P Xiao - IEEE Access, 2020 - ieeexplore.ieee.org
In this paper, a neural network-based lip reading system is proposed. The system is lexicon-
free and uses purely visual cues. With only a limited number of visemes as classes to …