A review on multimodal zero‐shot learning

W Cao, Y Wu, Y Sun, H Zhang, J Ren… - … : Data Mining and …, 2023 - Wiley Online Library
Multimodal learning provides a path to fully utilize all types of information related to the
modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a …

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arXiv preprint arXiv …, 2022 - arxiv.org
NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

Multimodal information bottleneck: Learning minimal sufficient unimodal and multimodal representations

S Mai, Y Zeng, H Hu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
Learning effective joint embedding for cross-modal data has always been a focus in the field
of multimodal machine learning. We argue that during multimodal fusion, the generated …

AdaptSum: Towards low-resource domain adaptation for abstractive summarization

T Yu, Z Liu, P Fung - arXiv preprint arXiv:2103.11332, 2021 - arxiv.org
State-of-the-art abstractive summarization models generally rely on extensive labeled data,
which lowers their generalization ability on domains where such data are not available. In …

Vision guided generative pre-trained language models for multimodal abstractive summarization

T Yu, W Dai, Z Liu, P Fung - arXiv preprint arXiv:2109.02401, 2021 - arxiv.org
Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …

An emoji-aware multitask framework for multimodal sarcasm detection

DS Chauhan, GV Singh, A Arora, A Ekbal… - Knowledge-Based …, 2022 - Elsevier
Sarcasm is a case of implicit emotion and needs additional information like context and
multimodality for better detection. But sometimes, this additional information also fails to help …

Multimodal end-to-end sparse model for emotion recognition

W Dai, S Cahyawijaya, Z Liu, P Fung - arXiv preprint arXiv:2103.09666, 2021 - arxiv.org
Existing works on multimodal affective computing tasks, such as emotion recognition,
generally adopt a two-phase pipeline, first extracting feature representations for each single …

Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning

HD Le, GS Lee, SH Kim, S Kim, HJ Yang - IEEE Access, 2023 - ieeexplore.ieee.org
Emotion recognition has been an active research area for a long time. Recently, multimodal
emotion recognition from video data has grown in importance with the explosion of video …

The weighted cross-modal attention mechanism with sentiment prediction auxiliary task for multimodal sentiment analysis

Q Chen, G Huang, Y Wang - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Human brain extracts the spatial and temporal semantic information by processing the multi-
modalities, which has contextually meaningful for perceiving and understanding the …

Multi-channel weight-sharing autoencoder based on cascade multi-head attention for multimodal emotion recognition

J Zheng, S Zhang, Z Wang, X Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Multimodal Emotion Recognition is challenging because of the heterogeneity gap among
different modalities. Due to the powerful ability of feature abstraction, Deep Neural Networks …