Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - IEEE Access, 2023 - ieeexplore.ieee.org
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning

K Srinivasan, K Raman, J Chen, M Bendersky… - Proceedings of the 44th …, 2021 - dl.acm.org
The milestone improvements brought about by deep representation learning and pre-
training techniques have led to large performance gains across downstream NLP, IR and …

A novel graph-based multi-modal fusion encoder for neural machine translation

Y Yin, F Meng, J Su, C Zhou, Z Yang, J Zhou… - arXiv preprint arXiv …, 2020 - arxiv.org
Multi-modal neural machine translation (NMT) aims to translate source sentences into a
target language paired with images. However, dominant multi-modal NMT models do not …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Uc2: Universal cross-lingual cross-modal vision-and-language pre-training

M Zhou, L Zhou, S Wang, Y Cheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …

Neural natural language generation: A survey on multilinguality, multimodality, controllability and learning

E Erdem, M Kuyu, S Yagcioglu, A Frank… - Journal of Artificial …, 2022 - jair.org
Developing artificial learning systems that can understand and generate natural language
has been one of the long-standing goals of artificial intelligence. Recent decades have …

Dynamic context-guided capsule network for multimodal machine translation

H Lin, F Meng, J Su, Y Yin, Z Yang, Y Ge… - Proceedings of the 28th …, 2020 - dl.acm.org
Multimodal machine translation (MMT), which mainly focuses on enhancing text-only
translation with visual features, has attracted considerable attention from both computer …

Grounding'grounding'in NLP

KR Chandu, Y Bisk, AW Black - arXiv preprint arXiv:2106.02192, 2021 - arxiv.org
The NLP community has seen substantial recent interest in grounding to facilitate interaction
between language technologies and the world. However, as a community, we use the term …

Multimodality information fusion for automated machine translation

L Li, T Tayir, Y Han, X Tao, JD Velásquez - Information Fusion, 2023 - Elsevier
Abstract Machine translation is a popular automation approach for translating texts between
different languages. Although traditionally it has a strong focus on natural language, images …