Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research …
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair. In this regard, dominant methods mainly focus on …
B Guan, Y Zhang, Y Zhao, C Zong - Proceedings of the 31st …, 2025 - aclanthology.org
Current video-guided machine translation (VMT) approaches primarily use coarse-grained visual information, resulting in information redundancy, high computational overhead, and …
H Shen, L Shao, W Li, Z Lan, Z Liu, J Su - arXiv preprint arXiv:2405.12669, 2024 - arxiv.org
In recent years, multi-modal machine translation has attracted significant interest in both academia and industry due to its superior performance. It takes both textual and visual …
C He, Q Gesang, N Qun, G Luosang… - 2024 6th International …, 2024 - ieeexplore.ieee.org
This article explores a Tibetan-Chinese machine translation model based on multimodal alignment of images and texts, using the Resnet50 model for feature extraction of images …
A Shurtz, L Sorenson… - Proceedings of the 2024 …, 2024 - aclanthology.org
We propose an approach that improves the performance of VMT (Video-guided Machine Translation) models, which integrate text and video modalities. We experiment with the MAD …