Translatotron-v (ison): An end-to-end model for in-image machine translation

Z Lan, L Niu, F Meng, J Zhou, M Zhang, J Su - arXiv preprint arXiv …, 2024 - arxiv.org
In-image machine translation (IIMT) aims to translate an image containing texts in source
language into an image containing translations in target language. In this regard …

Exploring better text image translation with multimodal codebook

Z Lan, J Yu, X Li, W Zhang, J Luan, B Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Text image translation (TIT) aims to translate the source texts embedded in the image to
target translations, which has a wide range of applications and thus has important research …

Towards better multi-modal keyphrase generation via visual entity enhancement and multi-granularity image noise filtering

Y Dong, S Wu, F Meng, J Zhou, X Wang, J Lin… - Proceedings of the 31st …, 2023 - dl.acm.org
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the
core points of the input text-image pair. In this regard, dominant methods mainly focus on …

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

B Guan, Y Zhang, Y Zhao, C Zong - Proceedings of the 31st …, 2025 - aclanthology.org
Current video-guided machine translation (VMT) approaches primarily use coarse-grained
visual information, resulting in information redundancy, high computational overhead, and …

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

H Shen, L Shao, W Li, Z Lan, Z Liu, J Su - arXiv preprint arXiv:2405.12669, 2024 - arxiv.org
In recent years, multi-modal machine translation has attracted significant interest in both
academia and industry due to its superior performance. It takes both textual and visual …

Research on Tibetan-Chinese Machine Translation Method Based on Graphic Multimodal Fusion Alignment

C He, Q Gesang, N Qun, G Luosang… - 2024 6th International …, 2024 - ieeexplore.ieee.org
This article explores a Tibetan-Chinese machine translation model based on multimodal
alignment of images and texts, using the Resnet50 model for feature extraction of images …

The Effects of Pretraining in Video-Guided Machine Translation

A Shurtz, L Sorenson… - Proceedings of the 2024 …, 2024 - aclanthology.org
We propose an approach that improves the performance of VMT (Video-guided Machine
Translation) models, which integrate text and video modalities. We experiment with the MAD …