High-quality nonparallel voice conversion based on cycle-consistent adversarial network

J Lu, K Zhou, B Sisman, H Li - 2020 Asia-Pacific Signal and …, 2020 - ieeexplore.ieee.org

Singing voice conversion aims to convert singer's voice from source to target without
changing singing content. Parallel training data is typically required for the training of …

被引用次数：19 相关文章所有 6 个版本

[PDF] arxiv.org

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

HT Luong, J Yamagishi - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective,
generating speech with a target voice. However, they are usually developed independently …

被引用次数：26 相关文章所有 7 个版本

[HTML] mdpi.com

[HTML][HTML] Manipulating voice attributes by adversarial learning of structured disentangled representations

L Benaroya, N Obin, A Roebel - Entropy, 2023 - mdpi.com

Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate
part of its content, primarily its identity, while maintaining the rest unchanged. Research in …

被引用次数：3 相关文章所有 13 个版本

Face-based voice conversion: Learning the voice behind a face

HH Lu, SE Weng, YF Yen, HH Shuai… - Proceedings of the 29th …, 2021 - dl.acm.org

Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in
recent years. Previous methods usually extract speaker embeddings from audios and use …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Rhythm-flexible voice conversion without parallel data using cycle-gan over phoneme posteriorgram sequences

C Yeh, P Hsu, J Chou, H Lee… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Speaking rate refers to the average number of phonemes within some unit time, while the
rhythmic patterns refer to duration distributions for realizations of different phonemes within …

被引用次数：28 相关文章所有 7 个版本

[图书][B] The science of deep learning

I Drori - 2022 - books.google.com

The Science of Deep Learning emerged from courses taught by the author that have
provided thousands of students with training and experience for their academic studies, and …

被引用次数：20 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams.

SH Mohammadi, T Kim - INTERSPEECH, 2019 - isca-archive.org

We propose voice conversion model from arbitrary source speaker to arbitrary target
speaker with disentangled representations. Voice conversion is a task to convert the voice of …

被引用次数：19 相关文章所有 7 个版本

[PDF] researchgate.net

[PDF][PDF] Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data.

B Sisman, H Li - Odyssey, 2020 - researchgate.net

Singing voice conversion (SVC) is a task to convert one singer's voice to sound like that of
another, without changing the lyrical content. Singing conveys lexical and emotional …

被引用次数：16 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Speaker anonymization by pitch shifting based on time-scale modification

CO Mawalim, S Okada, M Unoki - 2nd Symposium on Security and …, 2022 - isca-archive.org

The increasing usage of speech in digital technology raises a privacy issue because speech
contains biometric information. Several methods of dealing with this issue have been …

被引用次数：10 相关文章所有 6 个版本

高级搜索

QQ 群