An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks

H Kameoka, T Kaneko, K Tanaka… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC)
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …

Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to
target speech without relying on parallel data. This is an important task, but it has been …

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org
We propose a non-parallel voice-conversion (VC) method that can learn a mapping from
source to target speech without relying on parallel data. The proposed method is particularly …

Parallel-data-free voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - arXiv preprint arXiv:1711.11293, 2017 - arxiv.org
We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping
from source to target speech without relying on parallel data. The proposed method is …

Time-frequency masking-based speech enhancement using generative adversarial network

MH Soni, N Shah, HA Patil - 2018 IEEE international …, 2018 - ieeexplore.ieee.org
The success of time-frequency (TF) mask-based approaches is dependent on the accuracy
of predicted mask given the noisy spectral features. The state-of-the-art methods in TF …

Stargan-vc2: Rethinking conditional methods for stargan-based voice conversion

T Kaneko, H Kameoka, K Tanaka, N Hojo - arXiv preprint arXiv …, 2019 - arxiv.org
Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings
among multiple domains without relying on parallel data. This is important but challenging …

ACVAE-VC: Non-parallel voice conversion with auxiliary classifier variational autoencoder

H Kameoka, T Kaneko, K Tanaka… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
This paper proposes a non-parallel voice conversion (VC) method using a variant of the
conditional variational autoencoder (VAE) called an auxiliary classifier VAE. The proposed …

Sequence-to-sequence acoustic modeling for voice conversion

JX Zhang, ZH Ling, LJ Liu, Y Jiang… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
In this paper, a neural network named sequence-to-sequence ConvErsion NeTwork
(SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT …