An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Textless speech emotion conversion using discrete and decomposed representations

F Kreuk, A Polyak, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

An overview & analysis of sequence-to-sequence emotional voice conversion

Z Yang, X Jing, A Triantafyllopoulos, M Song… - arXiv preprint arXiv …, 2022 - arxiv.org
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source
to a target emotion; it can thus be a key enabling technology for human-computer interaction …

Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2103.16809, 2021 - arxiv.org
Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …

Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation

S Latif, A Shahid, J Qadir - Applied Acoustics, 2023 - Elsevier
Despite advances in deep learning, current state-of-the-art speech emotion recognition
(SER) systems still have poor performance due to a lack of speech emotion datasets. This …

Durflex-evc: Duration-flexible emotional voice conversion with parallel generation

HS Oh, SH Lee, DH Cho, SW Lee - arXiv preprint arXiv:2401.08095, 2024 - arxiv.org
Emotional voice conversion involves modifying the pitch, spectral envelope, and other
acoustic characteristics of speech to match a desired emotional state while maintaining the …

Improve few-shot voice cloning using multi-modal learning

H Zhang, Y Lin - … 2022-2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Recently, few-shot voice cloning has achieved a significant improvement. However, most
models for few-shot voice cloning are single-modal, and multi-modal few-shot voice cloning …

An improved CycleGAN-based emotional voice conversion model by augmenting temporal dependency with a transformer

C Fu, C Liu, CT Ishi, H Ishiguro - Speech Communication, 2022 - Elsevier
Emotional voice conversion (EVC) is a task that converts an utterance's emotional features
into a target one while retaining semantic information and speaker identity. Recently, some …