End-to-end audiovisual fusion with LSTMs

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：62 相关文章所有 2 个版本

[PDF] researchgate.net

Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier

In the last few years, there has been an increasing interest in developing systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

被引用次数：158 相关文章所有 3 个版本

[PDF] mdpi.com

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …

被引用次数：75 相关文章所有 9 个版本

[PDF] arxiv.org

End-to-end audiovisual speech recognition

S Petridis, T Stafylakis, P Ma, F Cai… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

Several end-to-end deep learning approaches have been recently presented which extract
either audio or visual features from the input images or audio signals and perform speech …

被引用次数：336 相关文章所有 12 个版本

Multimodal sparse transformer network for audio-visual speech recognition

Q Song, B Sun, S Li - IEEE Transactions on Neural Networks …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) is the major human–machine interface in many
intelligent systems, such as intelligent homes, autonomous driving, and servant robots …

被引用次数：79 相关文章所有 3 个版本

[PDF] arxiv.org

Towards practical lipreading with distilled and efficient models

P Ma, B Martinez, S Petridis… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Lipreading has witnessed a lot of progress due to the resurgence of neural networks. Recent
works have placed emphasis on aspects such as improving performance by finding the …

被引用次数：124 相关文章所有 6 个版本

[PDF] arxiv.org

Audio-visual speech recognition with a hybrid ctc/attention architecture

S Petridis, T Stafylakis, P Ma… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Recent works in speech recognition rely either on connectionist temporal classification
(CTC) or sequence-to-sequence models for character-level recognition. CTC assumes …

被引用次数：162 相关文章所有 9 个版本

[PDF] thecvf.com

Lip-reading with densely connected temporal convolutional networks

P Ma, Y Wang, J Shen, S Petridis… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN)
for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have …

被引用次数：69 相关文章所有 10 个版本

[PDF] ieee.org

Deep learning-based automated lip-reading: A survey

S Fenghour, D Chen, K Guo, B Li, P Xiao - IEEE Access, 2021 - ieeexplore.ieee.org

A survey on automated lip-reading approaches is presented in this paper with the main
focus being on deep learning related methodologies which have proven to be more fruitful …

被引用次数：57 相关文章所有 4 个版本

[PDF] ieee.org

Lip reading sentences using deep learning with only visual cues

S Fenghour, D Chen, K Guo, P Xiao - IEEE Access, 2020 - ieeexplore.ieee.org

In this paper, a neural network-based lip reading system is proposed. The system is lexicon-
free and uses purely visual cues. With only a limited number of visemes as classes to …

被引用次数：63 相关文章所有 3 个版本

高级搜索

QQ 群