Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations

JX Zhang, ZH Ling, LR Dai - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
This article presents a method of sequence-to-sequence (seq2seq) voice conversion using
non-parallel training data. In this method, disentangled linguistic and speaker …

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

H Kameoka, K Tanaka, D Kwaśny… - … on audio, speech …, 2020 - ieeexplore.ieee.org
This article proposes a voice conversion (VC) method using sequence-to-sequence
(seq2seq or S2S) learning, which flexibly converts not only the voice characteristics but also …

Any-to-many voice conversion with location-relative sequence-to-sequence modeling

S Liu, Y Cao, D Wang, X Wu, X Liu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq),
non-parallel voice conversion approach, which utilizes text supervision during training. In …

Many-to-many voice conversion using conditional cycle-consistent adversarial networks

S Lee, BG Ko, K Lee, IC Yoo… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Voice conversion (VC) refers to transforming the speaker characteristics of an utterance
without altering its linguistic contents. Many works on voice conversion require to have …

Any-to-one sequence-to-sequence voice conversion using self-supervised discrete speech representations

WC Huang, YC Wu, T Hayashi - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-
sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those …

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining

WC Huang, T Hayashi, YC Wu, H Kameoka… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based
on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models …

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arXiv preprint arXiv …, 2020 - arxiv.org
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …

Drvc: A framework of any-to-any voice conversion with self-supervised learning

Q Wang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …

Many-to-many voice transformer network

H Kameoka, WC Huang, K Tanaka… - … on Audio, Speech …, 2020 - ieeexplore.ieee.org
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence
(S2S) learning framework, which enables simultaneous conversion of the voice …

Sequence-to-sequence acoustic modeling for voice conversion

JX Zhang, ZH Ling, LJ Liu, Y Jiang… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
In this paper, a neural network named sequence-to-sequence ConvErsion NeTwork
(SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT …