An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Textless speech emotion conversion using discrete and decomposed representations

F Kreuk, A Polyak, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

An overview & analysis of sequence-to-sequence emotional voice conversion

Z Yang, X Jing, A Triantafyllopoulos, M Song… - arXiv preprint arXiv …, 2022 - arxiv.org
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source
to a target emotion; it can thus be a key enabling technology for human-computer interaction …

Copypaste: An augmentation method for speech emotion recognition

R Pappagari, J Villalba, P Żelasko… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Data augmentation is a widely used strategy for training robust machine learning models. It
partially alleviates the problem of limited data for tasks like speech emotion recognition …

End-to-end modeling and transfer learning for audiovisual emotion recognition in-the-wild

D Dresvyanskiy, E Ryumina, H Kaya… - Multimodal …, 2022 - mdpi.com
As emotions play a central role in human communication, automatic emotion recognition has
attracted increasing attention in the last two decades. While multimodal systems enjoy high …

Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2103.16809, 2021 - arxiv.org
Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …

Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition

Z Ma, W Wu, Z Zheng, Y Guo, Q Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-
the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and …

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …