Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to …
In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases. We then motivate the development …
Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …
YH Chen, DY Wu, TH Wu, H Lee - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content …
We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …
Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial …
W Zhang, BY Lim - Proceedings of the 2022 CHI Conference on Human …, 2022 - dl.acm.org
Machine learning models need to provide contrastive explanations, since people often seek to understand why a puzzling prediction occurred instead of some expected outcome …
H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Voice Conversion (VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning …
Non-parallel voice conversion (VC) is a technique for training voice converters without a parallel corpus. Cycle-consistent adversarial network-based VCs (CycleGAN-VC and …