Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Emonet: A transfer learning framework for multi-corpus speech emotion recognition

M Gerczuk, S Amiriparian, S Ottl… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this manuscript, the topic of multi-corpus Speech Emotion Recognition (SER) is
approached from a deep transfer learning perspective. A large corpus of emotional speech …

Probing speech emotion recognition transformers for linguistic knowledge

A Triantafyllopoulos, J Wagner, H Wierstorf… - arXiv preprint arXiv …, 2022 - arxiv.org
Large, pre-trained neural networks consisting of self-attention layers (transformers) have
recently achieved state-of-the-art results on several speech emotion recognition (SER) …

HEAR4Health: a blueprint for making computer audition a staple of modern healthcare

A Triantafyllopoulos, A Kathan, A Baird… - Frontiers in Digital …, 2023 - frontiersin.org
Recent years have seen a rapid increase in digital medicine research in an attempt to
transform traditional healthcare systems to their modern, intelligent, and versatile …

Computer audition: From task-specific machine learning to foundation models

A Triantafyllopoulos, I Tsangko, A Gebhard… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundation models (FMs) are increasingly spearheading recent advances on a variety of
tasks that fall under the purview of computer audition--the use of machines to understand …

Unsupervised cross-corpus speech emotion recognition using a multi-source cycle-GAN

BH Su, CC Lee - IEEE Transactions on Affective Computing, 2022 - ieeexplore.ieee.org
Speech emotion recognition (SER) plays a crucial role in understanding user feelings when
developing artificial intelligence services. However, the data mismatch and label distortion …

INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition

A Triantafyllopoulos, A Batliner, S Rampp… - arXiv preprint arXiv …, 2024 - arxiv.org
We revisit the INTERSPEECH 2009 Emotion Challenge--the first ever speech emotion
recognition (SER) challenge--and evaluate a series of deep learning models that are …

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

A Triantafyllopoulos, U Reichel, S Liu… - Frontiers in Computer …, 2023 - frontiersin.org
Introduction The effective fusion of text and audio information for categorical and
dimensional speech emotion recognition (SER) remains an open issue, especially given the …

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

A Triantafyllopoulos, M Song, Z Yang, X Jing… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we explore a novel few-shot personalisation architecture for emotional
vocalisation prediction. The core contribution is anenrolment'encoder which utilises two …

Self-supervised learning for infant cry analysis

A Gorin, C Subakan, S Abdoli, J Wang… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
In this paper, we explore self-supervised learning (SSL) for analyzing a first-of-its-kind
database of cry recordings containing clinical indications of more than a thousand …