Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Multimodal sentiment analysis based on fusion methods: A survey

L Zhu, Z Zhu, C Zhang, Y Xu, X Kong - Information Fusion, 2023 - Elsevier
Sentiment analysis is an emerging technology that aims to explore people's attitudes toward
an entity. It can be applied in a variety of different fields and scenarios, such as product …

Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis

W Yu, H Xu, Z Yuan, J Wu - Proceedings of the AAAI conference on …, 2021 - ojs.aaai.org
Abstract Representation Learning is a significant and challenging task in multimodal
learning. Effective modality representations should contain two parts of characteristics: the …

Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis

W Han, H Chen, S Poria - arXiv preprint arXiv:2109.00412, 2021 - arxiv.org
In multimodal sentiment analysis (MSA), the performance of a model highly depends on the
quality of synthesized embeddings. These embeddings are generated from the upstream …

Cdtrans: Cross-domain transformer for unsupervised domain adaptation

T Xu, W Chen, P Wang, F Wang, H Li, R Jin - arXiv preprint arXiv …, 2021 - arxiv.org
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled
source domain to a different unlabeled target domain. Most existing UDA methods focus on …

Disentangled representation learning for multimodal emotion recognition

D Yang, S Huang, H Kuang, Y Du… - Proceedings of the 30th …, 2022 - dl.acm.org
Multimodal emotion recognition aims to identify human emotions from text, audio, and visual
modalities. Previous methods either explore correlations between different modalities or …

Decoupled multimodal distilling for emotion recognition

Y Li, Y Wang, Z Cui - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …

Are multimodal transformers robust to missing modality?

M Ma, J Ren, L Zhao, D Testuggine… - Proceedings of the …, 2022 - openaccess.thecvf.com
Multimodal data collected from the real world are often imperfect due to missing modalities.
Therefore multimodal models that are robust against modal-incomplete data are highly …