An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

A literature review and perspectives in deepfakes: generation, detection, and applications

D Dagar, DK Vishwakarma - International journal of multimedia information …, 2022 - Springer
In the last few years, with the advancement of deep learning methods, especially Generative
Adversarial Networks (GANs) and Variational Auto-encoders (VAEs), fabricated content has …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Freevc: Towards high-quality text-free one-shot voice conversion

J Li, W Tu, L Xiao - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) can be achieved by first extracting source content information and
target speaker information, and then reconstructing waveform with these information …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Robust disentangled variational speech representation learning for zero-shot voice conversion

J Lian, C Zhang, D Yu - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Traditional studies on voice conversion (VC) have made progress with parallel training data
and known speakers. Good voice conversion quality is obtained by exploring better …

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

Av-transpeech: Audio-visual robust speech-to-speech translation

R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye… - arXiv preprint arXiv …, 2023 - arxiv.org
Direct speech-to-speech translation (S2ST) aims to convert speech from one language into
another, and has demonstrated significant progress to date. Despite the recent success …

Towards improved zero-shot voice conversion with conditional dsvae

J Lian, C Zhang, GK Anumanchipalli, D Yu - arXiv preprint arXiv …, 2022 - arxiv.org
Disentangling content and speaking style information is essential for zero-shot non-parallel
voice conversion (VC). Our previous study investigated a novel framework with disentangled …

Refxvc: Cross-lingual voice conversion with enhanced reference leveraging

M Zhang, Y Zhou, Y Ren, C Zhang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that
leverages reference information to improve conversion performance. Previous XVC works …