Contrastive token-wise meta-learning for unseen performer visual temporal-aligned translation

Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

X Cheng, T Jin, R Huang, L Li, W Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …

被引用次数：14 相关文章所有 6 个版本

Rethinking Missing Modality Learning from a Decoding Perspective

T Jin, X Cheng, L Li, W Lin, Y Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

Conventional pipeline of multimodal learning consists of three stages, including encoding,
fusion, and decoding. Most existing methods under missing modality condition focus on the …

被引用次数：3 相关文章

[PDF] thecvf.com

Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models

P Fayyazsanavi, N Nejatishahidin… - Proceedings of the …, 2024 - openaccess.thecvf.com

We address the task of American Sign Language fingerspelling translation using videos in
the wild. We exploit advances in more accurate hand pose estimation and propose a novel …

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation

X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin… - arXiv preprint arXiv …, 2023 - arxiv.org

Direct speech-to-speech translation achieves high-quality results through the introduction of
discrete units obtained from self-supervised learning. This approach circumvents delays and …

被引用次数：1 相关文章所有 3 个版本

[PDF] aclanthology.org

PLAES: Prompt-generalized and Level-aware Learning Framework for Cross-prompt Automated Essay Scoring

Y Chen, X Li - Proceedings of the 2024 Joint International …, 2024 - aclanthology.org

Current cross-prompt automatic essay scoring (AES) systems are primarily concerned with
obtaining shared knowledge specific to the target prompt by using the source and target …

高级搜索

QQ 群