VisTFC: Vision-guided target-side future context learning for neural machine translation

S Zhu, S Li, D Xiong - Expert Systems with Applications, 2024 - Elsevier
Visual features encompass visual information extracted from images or videos, serving as
supplementary input to enhance the efficacy of neural machine translation (NMT) systems …

Bigvideo: A large-scale video subtitle translation dataset for multimodal machine translation

L Kang, L Huang, N Peng, P Zhu, Z Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of
multi-modality machine translation. Compared with the widely used How2 and VaTeX …

Virtual visual-guided domain-shadow fusion via modal exchanging for domain-specific multi-modal neural machine translation

Z Hou, J Guo - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Incorporating domain-specific visual information into text poses one of the critical challenges
for domain-specific multi-modal neural machine translation (DMNMT). While most existing …

LAMBDA: Large Language Model-Based Data Augmentation for Multi-Modal Machine Translation

Y Wang, D Li, J Shen, Y Xu, M Xu… - Findings of the …, 2024 - aclanthology.org
Multi-modal machine translation (MMT) can reduce ambiguity and semantic distortion
compared with traditional machine translation (MT) by utilizing auxiliary information such as …

Controlling styles in neural machine translation with activation prompt

Y Wang, Z Sun, S Cheng, W Zheng, M Wang - arXiv preprint arXiv …, 2022 - arxiv.org
Controlling styles in neural machine translation (NMT) has attracted wide attention, as it is
crucial for enhancing user experience. Earlier studies on this topic typically concentrate on …

Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

J Guo, Z Hou, Y Xian, Z Yu - Pattern Recognition, 2024 - Elsevier
Abstract Domain-specific Multi-modal Neural Machine Translation (DMNMT) aims to
translate domain-specific sentences from a source language to a target language by …

Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling

J Guo, R Su, J Ye - Neural Networks, 2024 - Elsevier
The goal of multi-modal neural machine translation (MNMT) is to incorporate language-
agnostic visual information into text to enhance the performance of machine translation …

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

B Guan, Y Zhang, Y Zhao, C Zong - Proceedings of the 31st …, 2025 - aclanthology.org
Current video-guided machine translation (VMT) approaches primarily use coarse-grained
visual information, resulting in information redundancy, high computational overhead, and …

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

H Shen, L Shao, W Li, Z Lan, Z Liu, J Su - arXiv preprint arXiv:2405.12669, 2024 - arxiv.org
In recent years, multi-modal machine translation has attracted significant interest in both
academia and industry due to its superior performance. It takes both textual and visual …

Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation

A Chen, Y Song, K Chen, M Yang, T Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual information has been introduced for enhancing machine translation (MT), and its
effectiveness heavily relies on the availability of large amounts of bilingual parallel sentence …