Speechreading using probabilistic models

G Potamianos, C Neti, G Gravier, A Garg… - Proceedings of the …, 2003 - ieeexplore.ieee.org

Visual speech information from the speaker's mouth region has been successfully shown to
improve noise robustness of automatic speech recognizers, thus promising to extend their …

被引用次数：962 相关文章所有 15 个版本

[PDF] academia.edu

[PDF][PDF] Audio-visual automatic speech recognition: An overview

G Potamianos, C Neti, J Luettin… - Issues in visual and audio …, 2004 - academia.edu

We have made significant progress in automatic speech recognition (ASR) for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

被引用次数：503 相关文章所有 5 个版本

[PDF] sciencedirect.com

Diffusion maps

RR Coifman, S Lafon - Applied and computational harmonic analysis, 2006 - Elsevier

In this paper, we provide a framework based upon diffusion processes for finding meaningful
geometric descriptions of data sets. We show that eigenfunctions of Markov matrices can be …

被引用次数：3835 相关文章所有 19 个版本

[PDF] academia.edu

Audio-visual speech modeling for continuous speech recognition

S Dupont, J Luettin - IEEE transactions on multimedia, 2000 - ieeexplore.ieee.org

This paper describes a speech recognition system that uses both acoustic and visual
speech information to improve recognition performance in noisy environments. The system …

被引用次数：811 相关文章所有 11 个版本

[PDF] academia.edu

Extraction of visual features for lipreading

I Matthews, TF Cootes, JA Bangham… - … on Pattern Analysis …, 2002 - ieeexplore.ieee.org

The multimodal nature of speech is often ignored in human-computer interaction, but lip
deformations and other body motion, such as those of the head, convey additional …

被引用次数：717 相关文章所有 17 个版本

[PDF] psu.edu

Data fusion and multicue data matching by diffusion maps

S Lafon, Y Keller, RR Coifman - IEEE Transactions on pattern …, 2006 - ieeexplore.ieee.org

Data fusion and multicue data matching are fundamental tasks of high-dimensional data
analysis. In this paper, we apply the recently introduced diffusion framework to address …

被引用次数：417 相关文章所有 19 个版本

[PDF] epfl.ch

[PDF][PDF] Audio visual speech recognition

C Neti, G Potamianos, J Luettin, I Matthews, H Glotin… - 2000 - infoscience.epfl.ch

We have made significant progress in automatic speech recognition ASR for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

被引用次数：376 相关文章所有 17 个版本

[PDF] academia.edu

A review of speech-based bimodal recognition

CC Chibelushi, F Deravi… - IEEE transactions on …, 2002 - ieeexplore.ieee.org

Speech recognition and speaker recognition by machine are crucial ingredients for many
important applications such as natural and flexible human-machine interfaces. Most …

被引用次数：310 相关文章所有 12 个版本

[PDF] arxiv.org

Lipformer: learning to lipread unseen speakers based on visual-landmark transformers

F Xue, Y Li, D Liu, Y Xie, L Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Lipreading refers to understanding and further translating the speech of a video speaker into
textual outputs. State-of-the-art lipreading methods excel in interpreting overlap speakers, ie …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Pushing the boundaries of audiovisual word recognition using residual networks and LSTMs

T Stafylakis, MH Khan, G Tzimiropoulos - Computer Vision and Image …, 2018 - Elsevier

Visual and audiovisual speech recognition are witnessing a renaissance which is largely
due to the advent of deep learning methods. In this paper, we present a deep learning …

被引用次数：73 相关文章所有 6 个版本

高级搜索

QQ 群