An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Automatic speech recognition using limited vocabulary: A survey

JLKE Fendji, DCM Tala, BO Yenke… - Applied Artificial …, 2022 - Taylor & Francis
ABSTRACT Automatic Speech Recognition (ASR) is an active field of research due to its
large number of applications and the proliferation of interfaces or computing devices that …

Textless speech emotion conversion using discrete and decomposed representations

F Kreuk, A Polyak, J Copet, E Kharitonov… - arXiv preprint arXiv …, 2021 - arxiv.org
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Probing speech emotion recognition transformers for linguistic knowledge

A Triantafyllopoulos, J Wagner, H Wierstorf… - arXiv preprint arXiv …, 2022 - arxiv.org
Large, pre-trained neural networks consisting of self-attention layers (transformers) have
recently achieved state-of-the-art results on several speech emotion recognition (SER) …

Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2103.16809, 2021 - arxiv.org
Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …

Neural Emotion Director: Speech-preserving semantic control of facial expressions in" in-the-wild" videos

FP Papantoniou, PP Filntisis… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we introduce a novel deep learning method for photo-realistic manipulation of
the emotional state of actors in" in-the-wild" videos. The proposed method is based on a …

Grad-stylespeech: Any-speaker adaptive text-to-speech synthesis with diffusion models

M Kang, D Min, SJ Hwang - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
There has been a significant progress in Text-To-Speech (TTS) synthesis technology in
recent years, thanks to the advancement in neural generative modeling. However, existing …

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org
Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …