Bigvideo: A large-scale video subtitle translation dataset for multimodal machine translation

Z Lan, L Niu, F Meng, J Zhou, M Zhang, J Su - arXiv preprint arXiv …, 2024 - arxiv.org

In-image machine translation (IIMT) aims to translate an image containing texts in source
language into an image containing translations in target language. In this regard …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Exploring better text image translation with multimodal codebook

Z Lan, J Yu, X Li, W Zhang, J Luan, B Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Text image translation (TIT) aims to translate the source texts embedded in the image to
target translations, which has a wide range of applications and thus has important research …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Towards better multi-modal keyphrase generation via visual entity enhancement and multi-granularity image noise filtering

Y Dong, S Wu, F Meng, J Zhou, X Wang, J Lin… - Proceedings of the 31st …, 2023 - dl.acm.org

Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the
core points of the input text-image pair. In this regard, dominant methods mainly focus on …

被引用次数：3 相关文章所有 3 个版本

[PDF] aclanthology.org

TriFine: A Large-Scale Dataset of Vision-Audio-Subtitle for Tri-Modal Machine Translation and Benchmark with Fine-Grained Annotated Tags

B Guan, Y Zhang, Y Zhao, C Zong - Proceedings of the 31st …, 2025 - aclanthology.org

Current video-guided machine translation (VMT) approaches primarily use coarse-grained
visual information, resulting in information redundancy, high computational overhead, and …

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

H Shen, L Shao, W Li, Z Lan, Z Liu, J Su - arXiv preprint arXiv:2405.12669, 2024 - arxiv.org

In recent years, multi-modal machine translation has attracted significant interest in both
academia and industry due to its superior performance. It takes both textual and visual …

被引用次数：2 相关文章所有 2 个版本

Research on Tibetan-Chinese Machine Translation Method Based on Graphic Multimodal Fusion Alignment

C He, Q Gesang, N Qun, G Luosang… - 2024 6th International …, 2024 - ieeexplore.ieee.org

This article explores a Tibetan-Chinese machine translation model based on multimodal
alignment of images and texts, using the Resnet50 model for feature extraction of images …

[PDF] aclanthology.org

The Effects of Pretraining in Video-Guided Machine Translation

A Shurtz, L Sorenson… - Proceedings of the 2024 …, 2024 - aclanthology.org

We propose an approach that improves the performance of VMT (Video-guided Machine
Translation) models, which integrate text and video modalities. We experiment with the MAD …

被引用次数：1 相关文章

高级搜索

QQ 群