An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

An overview of voice conversion systems

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier
Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org
We propose a non-parallel voice-conversion (VC) method that can learn a mapping from
source to target speech without relying on parallel data. The proposed method is particularly …

Parallel-data-free voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - arXiv preprint arXiv:1711.11293, 2017 - arxiv.org
We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping
from source to target speech without relying on parallel data. The proposed method is …

Statistical parametric speech synthesis incorporating generative adversarial networks

Y Saito, S Takamichi… - IEEE/ACM Transactions on …, 2017 - ieeexplore.ieee.org
A method for statistical parametric speech synthesis incorporating generative adversarial
networks (GANs) is proposed. Although powerful deep neural networks techniques can be …

Voice conversion using deep bidirectional long short-term memory based recurrent neural networks

L Sun, S Kang, K Li, H Meng - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
This paper investigates the use of Deep Bidirectional Long Short-Term Memory based
Recurrent Neural Networks (DBLSTM-RNNs) for voice conversion. Temporal correlations …

Sequence-to-sequence acoustic modeling for voice conversion

JX Zhang, ZH Ling, LJ Liu, Y Jiang… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
In this paper, a neural network named sequence-to-sequence ConvErsion NeTwork
(SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT …

[HTML][HTML] D4C, a band-aperiodicity estimator for high-quality speech synthesis

M Morise - Speech Communication, 2016 - Elsevier
An algorithm is proposed for estimating the band aperiodicity of speech signals, where
“aperiodicity” is defined as the power ratio between the speech signal and the aperiodic …

[PDF][PDF] The Voice Conversion Challenge 2016.

T Toda, LH Chen, D Saito, F Villavicencio, M Wester… - Interspeech, 2016 - isca-archive.org
This paper describes the Voice Conversion Challenge 2016 devised by the authors to better
understand different voice conversion (VC) techniques by comparing their performance on a …

Method and system for non-parametric voice conversion

I Agiomyrgiannakis - US Patent 9,183,830, 2015 - Google Patents
GIOL I5/04(2013.01) A method and system is disclosed for non-parametric speech GIOL
I5/4(2006.01) conversion. A text-to-speech (TTS) synthesis system may GIOL I3/02(2013.01) …