Audio-visual speech recognition in misp2021 challenge: Dataset release and deep analysis

H Chen, J Du, Y Dai, CH Lee… - Proceedings of the …, 2022 - research.tudelft.nl
In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of
MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting …

Improving audio-visual speech recognition by lip-subword correlation based visual pre-training and cross-modal fusion encoder

Y Dai, H Chen, J Du, X Ding, N Ding… - … on Multimedia and …, 2023 - ieeexplore.ieee.org
In recent research, slight performance improvement is observed from automatic speech
recognition systems to audio-visual speech recognition systems in end-to-end frameworks …

Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition

J Wang, Z Guo, C Yang, X Li… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Compared to feature or decision fusion, hybrid fusion can beneficially improve audio-visual
speech recognition accuracy. Existing works are mainly prone to design the multi-modality …

DuAGNet: an unrestricted multimodal speech recognition framework using dual adaptive gating fusion

J Wu, Y Zhang, M Zhang, C Zheng, X Zhang, L Xie… - Applied …, 2025 - Springer
Speech recognition is a major communication channel for human-machine interaction with
outstanding breakthroughs. However, the practicality of single-modal speech recognition is …

[HTML][HTML] Enhanced Conformer-Based Speech Recognition via Model Fusion and Adaptive Decoding with Dynamic Rescoring

J Geng, D Jia, Z He, N Wu, Z Li - Applied Sciences, 2024 - mdpi.com
Speech recognition is widely applied in fields like security, education, and healthcare. While
its development drives global information infrastructure and AI strategies, current models still …

An End-to-End Mandarin Audio-Visual Speech Recognition Model with a Feature Enhancement Module

J Wang, C Yang, Z Guo, X Li… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Compared to relying only on audio information, incorporating visual information improves
speech recognition accuracy in noisy environments. Existing works are prone to design …

An Investigation into Audio–Visual Speech Recognition under a Realistic Home–TV Scenario

B Yin, S Niu, H Tang, L Sun, J Du, Z Ling, C Liu - Applied Sciences, 2023 - mdpi.com
Robust speech recognition in real world situations is still an important problem, especially
when it is affected by environmental interference factors and conversational multi-speaker …

Robust Hybrid Fusion Audio-Visual Speech Recognition Based on Frequency Domain Data Pre-Processing

J Wang, Z Guo, X Li, C Hu, A Xue - Available at SSRN 4691084 - papers.ssrn.com
Traditional audio-visual data processing methods usually use greyscale images and MFCC
processing, but such methods result in the loss of information. In addition, existing works are …