[HTML][HTML] Multimodal transformer for unaligned multimodal language sequences

YHH Tsai, S Bai, PP Liang, JZ Kolter… - Proceedings of the …, 2019 - ncbi.nlm.nih.gov
Human language is often multimodal, which comprehends a mixture of natural language,
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …

Gated attention fusion network for multimodal sentiment classification

Y Du, Y Liu, Z Peng, X Jin - Knowledge-Based Systems, 2022 - Elsevier
Sentiment classification can explore the opinions expressed by people and help them make
better decisions. With the increasing of multimodal contents on the web, such as text, image …

Mtag: Modal-temporal attention graph for unaligned human multimodal language sequences

J Yang, Y Wang, R Yi, Y Zhu, A Rehman… - arXiv preprint arXiv …, 2020 - arxiv.org
Human communication is multimodal in nature; it is through multiple modalities such as
language, voice, and facial expressions, that opinions and emotions are expressed. Data in …

What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis

D Gkoumas, Q Li, C Lioma, Y Yu, D Song - Information Fusion, 2021 - Elsevier
Multimodal video sentiment analysis is a rapidly growing area. It combines verbal (ie,
linguistic) and non-verbal modalities (ie, visual, acoustic) to predict the sentiment of …

Excavating multimodal correlation for representation learning

S Mai, Y Sun, Y Zeng, H Hu - Information Fusion, 2023 - Elsevier
A majority of previous methods for multimodal representation learning ignore the rich
correlation information inherently stored in each sample, leading to a lack of robustness …

A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis

Y Zhang, D Song, X Li, P Zhang, P Wang, L Rong… - Information …, 2020 - Elsevier
Sentiment analysis in conversations is an emerging yet challenging artificial intelligence (AI)
task. It aims to discover the affective states and emotional changes of speakers involved in a …

Learning from the global view: Supervised contrastive learning of multimodal representation

S Mai, Y Zeng, H Hu - Information Fusion, 2023 - Elsevier
The development of technology enables the availability of abundant multimodal data, which
can be utilized in many representation learning tasks. However, most methods ignore the …

A deep learning architecture of RA-DLNet for visual sentiment analysis

A Yadav, DK Vishwakarma - Multimedia Systems, 2020 - Springer
Visual media has become one of the most potent means of conveying opinions or
sentiments on the web. Millions of photos are being uploaded by the people on famous …

Multi-channel weight-sharing autoencoder based on cascade multi-head attention for multimodal emotion recognition

J Zheng, S Zhang, Z Wang, X Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Multimodal Emotion Recognition is challenging because of the heterogeneity gap among
different modalities. Due to the powerful ability of feature abstraction, Deep Neural Networks …

Scalevlad: Improving multimodal sentiment analysis via multi-scale fusion of locally descriptors

H Luo, L Ji, Y Huang, B Wang, S Ji, T Li - arXiv preprint arXiv:2112.01368, 2021 - arxiv.org
Fusion technique is a key research topic in multimodal sentiment analysis. The recent
attention-based fusion demonstrates advances over simple operation-based fusion …