Vaw-gan for singing voice conversion with non-parallel training data

J Lu, K Zhou, B Sisman, H Li - 2020 Asia-Pacific Signal and …, 2020 - ieeexplore.ieee.org
Singing voice conversion aims to convert singer's voice from source to target without
changing singing content. Parallel training data is typically required for the training of …

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

HT Luong, J Yamagishi - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective,
generating speech with a target voice. However, they are usually developed independently …

[HTML][HTML] Manipulating voice attributes by adversarial learning of structured disentangled representations

L Benaroya, N Obin, A Roebel - Entropy, 2023 - mdpi.com
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate
part of its content, primarily its identity, while maintaining the rest unchanged. Research in …

Face-based voice conversion: Learning the voice behind a face

HH Lu, SE Weng, YF Yen, HH Shuai… - Proceedings of the 29th …, 2021 - dl.acm.org
Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in
recent years. Previous methods usually extract speaker embeddings from audios and use …

Rhythm-flexible voice conversion without parallel data using cycle-gan over phoneme posteriorgram sequences

C Yeh, P Hsu, J Chou, H Lee… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Speaking rate refers to the average number of phonemes within some unit time, while the
rhythmic patterns refer to duration distributions for realizations of different phonemes within …

[图书][B] The science of deep learning

I Drori - 2022 - books.google.com
The Science of Deep Learning emerged from courses taught by the author that have
provided thousands of students with training and experience for their academic studies, and …

[PDF][PDF] One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams.

SH Mohammadi, T Kim - INTERSPEECH, 2019 - isca-archive.org
We propose voice conversion model from arbitrary source speaker to arbitrary target
speaker with disentangled representations. Voice conversion is a task to convert the voice of …

[PDF][PDF] Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.

B Sisman, H Li - Odyssey, 2020 - researchgate.net
Singing voice conversion (SVC) is a task to convert one singer's voice to sound like that of
another, without changing the lyrical content. Singing conveys lexical and emotional …

[PDF][PDF] Speaker anonymization by pitch shifting based on time-scale modification

CO Mawalim, S Okada, M Unoki - 2nd Symposium on Security and …, 2022 - isca-archive.org
The increasing usage of speech in digital technology raises a privacy issue because speech
contains biometric information. Several methods of dealing with this issue have been …