Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic …
Easy access to audio-visual content on social media, combined with the availability of modern tools such as Tensorflow or Keras, and open-source trained models, along with …
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to …
Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings among multiple domains without relying on parallel data. This is important but challenging …
YH Chen, DY Wu, TH Wu, H Lee - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content …
Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial …
JX Zhang, ZH Ling, LR Dai - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
This article presents a method of sequence-to-sequence (seq2seq) voice conversion using non-parallel training data. In this method, disentangled linguistic and speaker …
K Man, J Chahl - Journal of Imaging, 2022 - mdpi.com
Development of computer vision algorithms using convolutional neural networks and deep learning has necessitated ever greater amounts of annotated and labelled data to produce …
SH Lee, JH Kim, H Chung… - Advances in Neural …, 2021 - proceedings.neurips.cc
Although recent advances in voice conversion have shown significant improvement, there still remains a gap between the converted voice and target voice. A key factor that maintains …