Muse 2023 challenge: Multimodal prediction of mimicked emotions, cross-cultural humour, and personalised recognition of affects

S Amiriparian, L Christ, A König, A Cowen… - Proceedings of the 31st …, 2023 - dl.acm.org
The 4th Multimodal Sentiment Analysis Challenge (MuSe) focuses on Multimodal Prediction
of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects …

Effect of attention and self-supervised speech embeddings on non-semantic speech tasks

P Mohapatra, A Pandey, Y Sui, Q Zhu - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Human emotion understanding is pivotal in making conversational technology mainstream.
We view speech emotion understanding as a perception task which is a more realistic …

Cascaded Cross-Modal Transformer for Request and Complaint Detection

NC Ristea, RT Ionescu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and
text transcripts to detect customer requests and complaints in phone conversations. Our …

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

D Porjazovski, Y Getman, T Grósz… - Proceedings of the 31st …, 2023 - dl.acm.org
Large pre-trained models are essential in paralinguistic systems, demonstrating
effectiveness in tasks like emotion recognition and stuttering detection. In this paper, we …

Automatic Audio Augmentation for Requests Sub-Challenge

Y Sun, K Xu, C Liu, Y Dou, K Qian - Proceedings of the 31st ACM …, 2023 - dl.acm.org
This paper presents our solution for the Requests Sub-challenge of the ACM Multimedia
2023 Computational Paralinguistics Challenge. Drawing upon the framework of self …

Cascaded cross-modal transformer for audio–textual classification

NC Ristea, A Anghel, RT Ionescu - Artificial Intelligence Review, 2024 - Springer
Speech classification tasks often require powerful language understanding models to grasp
useful features, which becomes problematic when limited training data is available. To attain …

Ensembling multilingual pre-trained models for predicting multi-label regression emotion share from speech

BT Atmaja, A Sasou - 2023 Asia Pacific Signal and Information …, 2023 - ieeexplore.ieee.org
Speech emotion recognition has evolved from research to practical applications. Previous
studies of emotion recognition from speech have focused on developing models on certain …

Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests Challenge

SR Viksit, V Abrol - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
The ACM Multimedia 2023 ComParE challenge focuses on classification/regression tasks
for spoken customer-agent and emotionally rated conversations. The challenge baseline …

From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques

D Porjazovski, T Grósz, M Kurimo - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speech embeddings, fixed-size representations derived from raw audio data, play a crucial
role in diverse machine learning applications. Despite the abundance of speech embedding …

[PDF][PDF] Wav2vec 2.0 Embeddings Are No Swiss Army Knife–A Case Study for Multiple Sclerosis

G Gosztolya, M Vetráb, V Svindt, J Bóna… - Proc. Interspeech …, 2024 - isca-archive.org
In the past few years, self-supervised learning has revolutionalized automatic speech
recognition. Self-supervised models such as wav2vec2, due to their generalization ability on …