Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

X Cheng, T Jin, R Huang, L Li, W Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …

Rethinking Missing Modality Learning from a Decoding Perspective

T Jin, X Cheng, L Li, W Lin, Y Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
Conventional pipeline of multimodal learning consists of three stages, including encoding,
fusion, and decoding. Most existing methods under missing modality condition focus on the …

Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models

P Fayyazsanavi, N Nejatishahidin… - Proceedings of the …, 2024 - openaccess.thecvf.com
We address the task of American Sign Language fingerspelling translation using videos in
the wild. We exploit advances in more accurate hand pose estimation and propose a novel …

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin… - arXiv preprint arXiv …, 2023 - arxiv.org
Direct speech-to-speech translation achieves high-quality results through the introduction of
discrete units obtained from self-supervised learning. This approach circumvents delays and …

PLAES: Prompt-generalized and Level-aware Learning Framework for Cross-prompt Automated Essay Scoring

Y Chen, X Li - Proceedings of the 2024 Joint International …, 2024 - aclanthology.org
Current cross-prompt automatic essay scoring (AES) systems are primarily concerned with
obtaining shared knowledge specific to the target prompt by using the source and target …