Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Probing speech emotion recognition transformers for linguistic knowledge

A Triantafyllopoulos, J Wagner, H Wierstorf… - arXiv preprint arXiv …, 2022 - arxiv.org
Large, pre-trained neural networks consisting of self-attention layers (transformers) have
recently achieved state-of-the-art results on several speech emotion recognition (SER) …

Deep learning approaches for bimodal speech emotion recognition: Advancements, challenges, and a multi-learning model

S Kakuba, A Poulose, DS Han - IEEE Access, 2023 - ieeexplore.ieee.org
Though acoustic speech emotion recognition has been studied for a while, bimodal speech
emotion recognition using both acoustic and text has gained momentum since speech …

[HTML][HTML] Addressing data scarcity in speech emotion recognition: A comprehensive review

S Kakuba, DS Han - ICT Express, 2024 - Elsevier
Speech emotion recognition (SER) is a critical field within affective computing, aiming to
detect and classify emotional states from speech signals, which vary dynamically over time …

Zero-shot personalization of speech foundation models for depressed mood monitoring

M Gerczuk, A Triantafyllopoulos, S Amiriparian… - Patterns, 2023 - cell.com
The monitoring of depressed mood plays an important role as a diagnostic tool in
psychotherapy. An automated analysis of speech can provide a non-invasive measurement …

The MERSA Dataset and a Transformer-Based Approach for Speech Emotion Recognition

E Zhang, R Trujillo, C Poellabauer - … of the 62nd Annual Meeting of …, 2024 - aclanthology.org
Research in the field of speech emotion recognition (SER) relies on the availability of
comprehensive datasets to make it possible to design accurate emotion detection models …

Fatigue prediction in outdoor running conditions using audio data

A Triantafyllopoulos, S Ottl, A Gebhard… - 2022 44th Annual …, 2022 - ieeexplore.ieee.org
Although running is a common leisure activity and a core training regiment for several
athletes, between 29% and 79% of runners sustain an overuse injury each year. These …

A Residual Multi-Scale Convolutional Neural Network with Transformers for Speech Emotion Recognition

T Yan, H Meng, E Parada-Cabaleiro… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The great variety of human emotional expression as well as the differences in the ways they
perceive and annotate them make Speech Emotion Recognition (SER) an ambiguous and …

A residual multi-scale convolutional transformer network with chunk-level log-mel spectrograms for speech emotion recognition

T Yan, H Meng, E Parada-Cabaleiro, J Tao… - Authorea …, 2023 - techrxiv.org
The great variety of human emotional expression as well as the differences in the ways they
perceive and annotate them make Speech Emotion Recognition (SER) an ambiguous and …

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

M Milling, S Liu, A Triantafyllopoulos, I Aslan… - Journal of Computer …, 2024 - Springer
Neural network models for audio tasks, such as automatic speech recognition (ASR) and
acoustic scene classification (ASC), are susceptible to noise contamination for real-life …