Deep learning for visual speech analysis: A survey

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier
Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

Accurate and resource-efficient lipreading with efficientnetv2 and transformers

A Koumparoulis, G Potamianos - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We present a novel resource-efficient end-to-end architecture for lipreading that achieves
state-of-the-art results on a popular and challenging benchmark. In particular, we make the …

Building function recognition using the semi-supervised classification

X Xie, Y Liu, Y Xu, Z He, X Chen, X Zheng, Z Xie - Applied Sciences, 2022 - mdpi.com
The functional classification of buildings is important for creating and managing urban zones
and assisting government departments. Building function recognition is incredibly valuable …

Importance-aware information bottleneck learning paradigm for lip reading

C Sheng, L Liu, W Deng, L Bai, Z Liu… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Lip reading is the task of decoding text from speakers' mouth movements. Numerous deep
learning-based methods have been proposed to address this task. However, these existing …

Audio-visual fusion network based on conformer for multimodal emotion recognition

P Guo, Z Chen, Y Li, H Liu - CAAI International Conference on Artificial …, 2022 - Springer
Audio-visual emotion recognition aims to integrate audio and visual information for accurate
emotion prediction, which is widely used in real application scenarios. However, most …

Another Point of View on Visual Speech Recognition

B Pouthier, L Pilati, G Valenti, C Bouveyron… - INTERSPEECH …, 2023 - hal.science
Standard Visual Speech Recognition (VSR) systems directly process images as input
features without any apriori link between raw pixel data and facial traits. Pixel information is …

[PDF][PDF] Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.

H Chen, J Du, Y Hu, LR Dai, BC Yin, CH Lee - Interspeech, 2021 - staff.ustc.edu.cn
In this paper, we propose a novel deep learning architecture for improving word-level lip-
reading. We first incorporate multiscale processing into spatial feature extraction for lip …

[HTML][HTML] Synchronous Analysis of Speech Production and Lips Movement to Detect Parkinson's Disease Using Deep Learning Methods

CD Ríos-Urrego, D Escobar-Grisales… - Diagnostics, 2024 - mdpi.com
Background/Objectives: Parkinson's disease (PD) affects more than 6 million people
worldwide. Its accurate diagnosis and monitoring are key factors to reduce its economic …

Collaborative Viseme Subword and End-to-end Modeling for Word-level Lip Reading

H Chen, Q Wang, J Du, GS Wan… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
We propose a viseme subword modeling (VSM) approach to improve the generalizability
and interpretability capabilities of deep neural network based lip reading. A comprehensive …