Any-to-many voice conversion with location-relative sequence-to-sequence modeling

S Liu, Y Cao, D Wang, X Wu, X Liu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq),
non-parallel voice conversion approach, which utilizes text supervision during training. In …

Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations

JX Zhang, ZH Ling, LR Dai - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
This article presents a method of sequence-to-sequence (seq2seq) voice conversion using
non-parallel training data. In this method, disentangled linguistic and speaker …

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

H Kameoka, K Tanaka, D Kwaśny… - … on audio, speech …, 2020 - ieeexplore.ieee.org
This article proposes a voice conversion (VC) method using sequence-to-sequence
(seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also …

Duration controllable voice conversion via phoneme-based information bottleneck

SH Lee, HR Noh, WJ Nam… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Several voice conversion (VC) methods using a simple autoencoder with a carefully
designed information bottleneck have recently been studied. In general, they extract content …

Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations

WC Huang, YC Wu, T Hayashi - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-
sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those …

Pretraining techniques for sequence-to-sequence voice conversion

WC Huang, T Hayashi, YC Wu… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to
their ability to convert prosody. Nonetheless, without sufficient data, seq2seq VC models can …

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining

WC Huang, T Hayashi, YC Wu, H Kameoka… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based
on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models …

S2VC: A framework for any-to-any voice conversion with self-supervised pretrained representations

J Lin, YY Lin, CM Chien, H Lee - arXiv preprint arXiv:2104.02901, 2021 - arxiv.org
Any-to-any voice conversion (VC) aims to convert the timbre of utterances from and to any
speakers seen or unseen during training. Various any-to-any VC approaches have been …

AttS2S-VC: Sequence-to-sequence voice conversion with attention and context preservation mechanisms

K Tanaka, H Kameoka, T Kaneko… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with
attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq …