Robust audio-visual mandarin speech recognition based on adaptive decision fusion and tone features

H Chen, J Du, Y Dai, CH Lee… - Proceedings of the …, 2022 - research.tudelft.nl

In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of
MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting …

被引用次数：27 相关文章所有 7 个版本

[PDF] arxiv.org

Improving audio-visual speech recognition by lip-subword correlation based visual pre-training and cross-modal fusion encoder

Y Dai, H Chen, J Du, X Ding, N Ding… - … on Multimedia and …, 2023 - ieeexplore.ieee.org

In recent research, slight performance improvement is observed from automatic speech
recognition systems to audio-visual speech recognition systems in end-to-end frameworks …

被引用次数：6 相关文章所有 4 个版本

Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition

J Wang, Z Guo, C Yang, X Li… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Compared to feature or decision fusion, hybrid fusion can beneficially improve audio-visual
speech recognition accuracy. Existing works are mainly prone to design the multi-modality …

被引用次数：2 相关文章所有 3 个版本

被引用次数：2 相关文章所有 2 个版本

[PDF] ssrn.com

Robust Hybrid Fusion Audio-Visual Speech Recognition Based on Frequency Domain Data Pre-Processing

J Wang, Z Guo, X Li, C Hu, A Xue - Available at SSRN 4691084 - papers.ssrn.com

Traditional audio-visual data processing methods usually use greyscale images and MFCC
processing, but such methods result in the loss of information. In addition, existing works are …

高级搜索

QQ 群