Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2023 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human-computer interaction. The expression of human emotion depends on …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arXiv preprint arXiv …, 2022 - arxiv.org
NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

面向深度学习的多模态情感识别研究进展.

赵小明, 杨轶娇, 张石清 - … of Frontiers of Computer Science & …, 2022 - search.ebscohost.com
多模态情感识别是指通过与人类情感表达相关的语音, 视觉, 文本等不同模态信息来识别人的
情感状态. 该研究在人机交互, 人工智能, 情感计算等领域有着重要的研究意义, 备受研究者关注 …

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Multimodal emotion detection via attention-based fusion of extracted facial and speech features

D Mamieva, AB Abdusalomov, A Kutlimuratov… - Sensors, 2023 - mdpi.com
Methods for detecting emotions that employ many modalities at the same time have been
found to be more accurate and resilient than those that rely on a single sense. This is due to …

Vision guided generative pre-trained language models for multimodal abstractive summarization

T Yu, W Dai, Z Liu, P Fung - arXiv preprint arXiv:2109.02401, 2021 - arxiv.org
Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …

M-SENA: An integrated platform for multimodal sentiment analysis

H Mao, Z Yuan, H Xu, W Yu, Y Liu, K Gao - arXiv preprint arXiv …, 2022 - arxiv.org
M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate
advanced research by providing flexible toolkits, reliable benchmarks, and intuitive …

A Facial Expression-Aware Multimodal Multi-task Learning Framework for Emotion Recognition in Multi-party Conversations

W Zheng, J Yu, R Xia, S Wang - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Abstract Multimodal Emotion Recognition in Multiparty Conversations (MERMC) has
recently attracted considerable attention. Due to the complexity of visual scenes in multi …

Enhancing speech emotion recognition using dual feature extraction encoders

I Pulatov, R Oteniyazov, F Makhmudov, YI Cho - Sensors, 2023 - mdpi.com
Understanding and identifying emotional cues in human speech is a crucial aspect of
human–computer communication. The application of computer technology in dissecting and …