Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with...

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier

In the last few years, there has been an increasing interest in developing systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

被引用次数：159 相关文章所有 3 个版本

[PDF] mdpi.com

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …

被引用次数：74 相关文章所有 9 个版本

[PDF] arxiv.org

Lipnet: End-to-end sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arXiv preprint arXiv …, 2016 - arxiv.org

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

被引用次数：497 相关文章所有 6 个版本

[PDF] arxiv.org

End-to-end audiovisual speech recognition

S Petridis, T Stafylakis, P Ma, F Cai… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

Several end-to-end deep learning approaches have been recently presented which extract
either audio or visual features from the input images or audio signals and perform speech …

被引用次数：337 相关文章所有 12 个版本

[PDF] ieee.org

A survey of research on lipreading technology

M Hao, M Mamut, N Yadikar, A Aysa, K Ubul - IEEE Access, 2020 - ieeexplore.ieee.org

Although automatic speech recognition (ASR) technology is mature, there are still some
unsolved problems, such as how to accurately identify what the speaker is saying in a noisy …

被引用次数：50 相关文章所有 3 个版本

[PDF] arxiv.org

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arXiv preprint arXiv …, 2018 - arxiv.org

This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

被引用次数：211 相关文章所有 7 个版本

[PDF] arxiv.org

Audio-visual speech recognition with a hybrid ctc/attention architecture

S Petridis, T Stafylakis, P Ma… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Recent works in speech recognition rely either on connectionist temporal classification
(CTC) or sequence-to-sequence models for character-level recognition. CTC assumes …

被引用次数：162 相关文章所有 9 个版本

[PDF] innovators-guide.ch

[PDF][PDF] Lipnet: Sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arXiv preprint arXiv …, 2016 - innovators-guide.ch

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

被引用次数：185 相关文章所有 5 个版本

[PDF] arxiv.org

Learning spatio-temporal features with two-stream deep 3d cnns for lipreading

X Weng, K Kitani - arXiv preprint arXiv:1905.02540, 2019 - arxiv.org

We focus on the word-level visual lipreading, which requires recognizing the word being
spoken, given only the video but not the audio. State-of-the-art methods explore the use of …

被引用次数：102 相关文章所有 6 个版本

[PDF] arxiv.org

End-to-end audiovisual fusion with LSTMs

S Petridis, Y Wang, Z Li, M Pantic - arXiv preprint arXiv:1709.04343, 2017 - arxiv.org

Several end-to-end deep learning approaches have been recently presented which
simultaneously extract visual features from the input images and perform visual speech …

被引用次数：57 相关文章所有 12 个版本

高级搜索

QQ 群