Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion.

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

被引用次数：41 相关文章所有 9 个版本

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

被引用次数：8 相关文章所有 2 个版本

[PDF] researchgate.net

Accurate and resource-efficient lipreading with efficientnetv2 and transformers

A Koumparoulis, G Potamianos - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

We present a novel resource-efficient end-to-end architecture for lipreading that achieves
state-of-the-art results on a popular and challenging benchmark. In particular, we make the …

被引用次数：28 相关文章所有 3 个版本

[PDF] mdpi.com

Building function recognition using the semi-supervised classification

X Xie, Y Liu, Y Xu, Z He, X Chen, X Zheng, Z Xie - Applied Sciences, 2022 - mdpi.com

The functional classification of buildings is important for creating and managing urban zones
and assisting government departments. Building function recognition is incredibly valuable …

被引用次数：11 相关文章所有 4 个版本

[PDF] oulu.fi

Importance-aware information bottleneck learning paradigm for lip reading

C Sheng, L Liu, W Deng, L Bai, Z Liu… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

Lip reading is the task of decoding text from speakers' mouth movements. Numerous deep
learning-based methods have been proposed to address this task. However, these existing …

被引用次数：5 相关文章所有 4 个版本

Audio-visual fusion network based on conformer for multimodal emotion recognition

P Guo, Z Chen, Y Li, H Liu - CAAI International Conference on Artificial …, 2022 - Springer

Audio-visual emotion recognition aims to integrate audio and visual information for accurate
emotion prediction, which is widely used in real application scenarios. However, most …

被引用次数：6 相关文章所有 2 个版本

[PDF] hal.science

Another Point of View on Visual Speech Recognition

B Pouthier, L Pilati, G Valenti, C Bouveyron… - INTERSPEECH …, 2023 - hal.science

Standard Visual Speech Recognition (VSR) systems directly process images as input
features without any apriori link between raw pixel data and facial traits. Pixel information is …

被引用次数：1 相关文章所有 6 个版本

[PDF] ustc.edu.cn

[PDF][PDF] Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.

H Chen, J Du, Y Hu, LR Dai, BC Yin, CH Lee - Interspeech, 2021 - staff.ustc.edu.cn

In this paper, we propose a novel deep learning architecture for improving word-level lip-
reading. We first incorporate multiscale processing into spatial feature extraction for lip …

被引用次数：11 相关文章所有 5 个版本

[HTML] mdpi.com

[HTML][HTML] Synchronous Analysis of Speech Production and Lips Movement to Detect Parkinson's Disease Using Deep Learning Methods

CD Ríos-Urrego, D Escobar-Grisales… - Diagnostics, 2024 - mdpi.com

Background/Objectives: Parkinson's disease (PD) affects more than 6 million people
worldwide. Its accurate diagnosis and monitoring are key factors to reduce its economic …

Collaborative Viseme Subword and End-to-end Modeling for Word-level Lip Reading

H Chen, Q Wang, J Du, GS Wan… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

We propose a viseme subword modeling (VSM) approach to improve the generalizability
and interpretability capabilities of deep neural network based lip reading. A comprehensive …

被引用次数：1 相关文章

高级搜索

QQ 群