Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Versatile diffusion: Text, images and variations all in one diffusion model

X Xu, Z Wang, G Zhang, K Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in diffusion models have set an impressive milestone in many generation
tasks, and trending works such as DALL-E2, Imagen, and Stable Diffusion have attracted …

Learning robust perceptive locomotion for quadrupedal robots in the wild

T Miki, J Lee, J Hwangbo, L Wellhausen, V Koltun… - Science robotics, 2022 - science.org
Legged robots that can operate autonomously in remote and hazardous environments will
greatly increase opportunities for exploration into underexplored areas. Exteroceptive …

A survey of vision-language pre-trained models

Y Du, Z Liu, J Li, WX Zhao - arXiv preprint arXiv:2202.10936, 2022 - arxiv.org
As transformer evolves, pre-trained models have advanced at a breakneck pace in recent
years. They have dominated the mainstream techniques in natural language processing …

Are multimodal transformers robust to missing modality?

M Ma, J Ren, L Zhao, D Testuggine… - Proceedings of the …, 2022 - openaccess.thecvf.com
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …

Multimodal prompting with missing modalities for visual recognition

YL Lee, YH Tsai, WC Chiu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when
missing-modality occurs either during training or testing in real-world situations; and 2) when …

The hateful memes challenge: Detecting hate speech in multimodal memes

D Kiela, H Firooz, A Mohan… - Advances in neural …, 2020 - proceedings.neurips.cc
This work proposes a new challenge set for multimodal classification, focusing on detecting
hate speech in multimodal memes. It is constructed such that unimodal models struggle and …

Smil: Multimodal learning with severely missing modality

M Ma, J Ren, L Zhao, S Tulyakov, C Wu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
A common assumption in multimodal learning is the completeness of training data, ie, full
modalities are available in all training examples. Although there exists research endeavor in …

Towards accurate scene text recognition with semantic reasoning networks

D Yu, X Li, C Zhang, T Liu, J Han… - Proceedings of the …, 2020 - openaccess.thecvf.com
Scene text image contains two levels of contents: visual texture and semantic information.
Although the previous scene text recognition methods have made great progress over the …