Audiovisual speech processing

M Turk - Pattern recognition letters, 2014 - Elsevier

People naturally interact with the world multimodally, through both parallel and sequential
use of multiple perceptual modalities. Multimodal human–computer interaction has sought …

被引用次数：587 相关文章所有 6 个版本

[PDF] pitt.edu

Toward an affect-sensitive multimodal human-computer interaction

M Pantic, LJM Rothkrantz - Proceedings of the IEEE, 2003 - ieeexplore.ieee.org

The ability to recognize affective states of a person we are communicating with is the core of
emotional intelligence. Emotional intelligence is a facet of human intelligence that has been …

被引用次数：1118 相关文章所有 22 个版本

TCD-TIMIT: An audio-visual corpus of continuous speech

N Harte, E Gillen - IEEE Transactions on Multimedia, 2015 - ieeexplore.ieee.org

Automatic audio-visual speech recognition currently lags behind its audio-only counterpart
in terms of major progress. One of the reasons commonly cited by researchers is the scarcity …

被引用次数：263 相关文章所有 5 个版本

[PDF] andrewsenior.com

Recent advances in the automatic recognition of audiovisual speech

G Potamianos, C Neti, G Gravier, A Garg… - Proceedings of the …, 2003 - ieeexplore.ieee.org

Visual speech information from the speaker's mouth region has been successfully shown to
improve noise robustness of automatic speech recognizers, thus promising to extend their …

被引用次数：952 相关文章所有 15 个版本

[PDF] arxiv.org

Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

被引用次数：50 相关文章所有 2 个版本

[PDF] academia.edu

[PDF][PDF] Audio-visual automatic speech recognition: An overview

G Potamianos, C Neti, J Luettin… - Issues in visual and audio …, 2004 - academia.edu

We have made significant progress in automatic speech recognition (ASR) for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

被引用次数：492 相关文章所有 5 个版本

[PDF] springer.com Full View

Dynamic Bayesian networks for audio-visual speech recognition

AV Nefian, L Liang, X Pi, X Liu, K Murphy - EURASIP Journal on …, 2002 - Springer

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the
speech generation mechanism, which is essentially bimodal in audio and visual …

被引用次数：439 相关文章所有 17 个版本

[PDF] academia.edu

[PDF][PDF] Advanced applications of neural networks and artificial intelligence: A review

K Kumar, GSM Thakur - International journal of information …, 2012 - academia.edu

(AI). It also considers the integration of neural networks with other computing methods Such
as fuzzy logic to enhance the interpretation ability of data. Artificial Neural Networks is …

被引用次数：210 相关文章所有 7 个版本

[PDF] arxiv.org

Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

被引用次数：28 相关文章所有 9 个版本

[PDF] ubc.ca

A coupled HMM for audio-visual speech recognition

AV Nefian, L Liang, X Pi, L Xiaoxiang… - … , Speech, and Signal …, 2002 - ieeexplore.ieee.org

In recent years several speech recognition systems that use visual together with audio
information showed significant increase in performance over the standard speech …

被引用次数：326 相关文章所有 18 个版本

高级搜索

QQ 群

Multimodal interaction: A review