Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

H Wang, P Guo, X Wan, H Zhou, L Xie - arXiv preprint arXiv:2404.05466, 2024 - arxiv.org
Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a
speaker's silent lip motion captured in video. Current mainstream lip-reading approaches …

The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024

H Wang, L Xie - arXiv preprint arXiv:2408.02369, 2024 - arxiv.org
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-
ASLP (Team 237) in the second Chinese Continuous Visual Speech Recognition Challenge …