Clippo: Image-and-language understanding from pixels only

M Tschannen, B Mustafa… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Multimodal models are becoming increasingly effective, in part due to unified components,
such as the Transformer architecture. However, multimodal models still often consist of many …

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arXiv preprint arXiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

Modal contrastive learning based end-to-end text image machine translation

C Ma, X Han, L Wu, Y Zhang, Y Zhao… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Text image machine translation (TIMT) aims at directly translating text in the source
language embedded in images into the target language. Most existing systems follow the …

Robust open-vocabulary translation from visual text representations

E Salesky, D Etter, M Post - arXiv preprint arXiv:2104.08211, 2021 - arxiv.org
Machine translation models have discrete vocabularies and commonly use subword
segmentation techniques to achieve an'open vocabulary.'This approach relies on consistent …

Multi-teacher knowledge distillation for end-to-end text image machine translation

C Ma, Y Zhang, M Tu, Y Zhao, Y Zhou… - … Conference on Document …, 2023 - Springer
Text image machine translation (TIMT) has been widely used in various real-world
applications, which translates source language texts in images into another target language …

Exploring better text image translation with multimodal codebook

Z Lan, J Yu, X Li, W Zhang, J Luan, B Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Text image translation (TIT) aims to translate the source texts embedded in the image to
target translations, which has a wide range of applications and thus has important research …

Improving end-to-end text image translation from the auxiliary text translation task

C Ma, Y Zhang, M Tu, X Han, L Wu… - 2022 26th …, 2022 - ieeexplore.ieee.org
End-to-end text image translation (TIT), which aims at translating the source language
embedded in images to the target language, has attracted intensive attention in recent …

E2timt: Efficient and effective modal adapter for text image machine translation

C Ma, Y Zhang, M Tu, Y Zhao, Y Zhou… - … Conference on Document …, 2023 - Springer
Text image machine translation (TIMT) aims to translate texts embedded in images from one
source language to another target language. Existing methods, both two-stage cascade and …

PEIT: Bridging the Modality Gap with Pre-trained Models for End-to-End Image Translation

S Zhu, S Li, Y Lei, D Xiong - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Image translation is a task that translates an image containing text in the source language to
the target language. One major challenge with image translation is the modality gap …

CCIM: Cross-modal Cross-lingual Interactive Image Translation

C Ma, Y Zhang, M Tu, Y Zhao, Y Zhou… - Findings of the …, 2023 - aclanthology.org
Text image machine translation (TIMT) which translates source language text images into
target language texts has attracted intensive attention in recent years. Although the end-to …