Autovc: Zero-shot voice style transfer with only autoencoder loss

K Qian, Y Zhang, S Chang, X Yang… - International …, 2019 - proceedings.mlr.press
Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …

Improving zero-shot voice style transfer via disentangled representation learning

S Yuan, P Cheng, R Zhang, W Hao, Z Gan… - arXiv preprint arXiv …, 2021 - arxiv.org
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to
generate speech as if it came from another (target) speaker. Previous works have made …

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

K Qian, Z Jin, M Hasegawa-Johnson… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Non-parallel many-to-many voice conversion remains an interesting but challenging speech
processing task. Many style-transfer-inspired methods such as generative adversarial …

Voicemixer: Adversarial voice style mixup

SH Lee, JH Kim, H Chung… - Advances in Neural …, 2021 - proceedings.neurips.cc
Although recent advances in voice conversion have shown significant improvement, there
still remains a gap between the converted voice and target voice. A key factor that maintains …

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arXiv preprint arXiv:2107.10394, 2021 - arxiv.org
We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

MelGAN-VC: Voice conversion and audio style transfer on arbitrarily long samples using spectrograms

M Pasini - arXiv preprint arXiv:1910.03713, 2019 - arxiv.org
Traditional voice conversion methods rely on parallel recordings of multiple speakers
pronouncing the same sentences. For real-world applications however, parallel data is …

Parallel-data-free voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - arXiv preprint arXiv:1711.11293, 2017 - arxiv.org
We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping
from source to target speech without relying on parallel data. The proposed method is …

Nvc-net: End-to-end adversarial voice conversion

B Nguyen, F Cardinaux - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …

SINGAN: Singing voice conversion with generative adversarial networks

B Sisman, K Vijayan, M Dong… - 2019 Asia-Pacific Signal …, 2019 - ieeexplore.ieee.org
Singing voice conversion (SVC) is a task to convert the source singer's voice to sound like
that of the target singer, without changing the lyrical content. So far, most of the voice …

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …