Transfer learning from speech synthesis to voice conversion with non-parallel training data

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：417 相关文章所有 8 个版本

A literature review and perspectives in deepfakes: generation, detection, and applications

D Dagar, DK Vishwakarma - International journal of multimedia information …, 2022 - Springer

In the last few years, with the advancement of deep learning methods, especially Generative
Adversarial Networks (GANs) and Variational Auto-encoders (VAEs), fabricated content has …

被引用次数：53 相关文章所有 2 个版本

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

被引用次数：180 相关文章所有 7 个版本

[PDF] arxiv.org

Freevc: Towards high-quality text-free one-shot voice conversion

J Li, W Tu, L Xiao - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Voice conversion (VC) can be achieved by first extracting source content information and
target speaker information, and then reconstructing waveform with these information …

被引用次数：112 相关文章所有 2 个版本

[PDF] ieee.org

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

被引用次数：55 相关文章所有 7 个版本

[PDF] arxiv.org

Robust disentangled variational speech representation learning for zero-shot voice conversion

J Lian, C Zhang, D Yu - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org

Traditional studies on voice conversion (VC) have made progress with parallel training data
and known speakers. Good voice conversion quality is obtained by exploring better …

被引用次数：51 相关文章所有 7 个版本

[HTML] nih.gov

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org

One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Av-transpeech: Audio-visual robust speech-to-speech translation

R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye… - arXiv preprint arXiv …, 2023 - arxiv.org

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into
another, and has demonstrated significant progress to date. Despite the recent success …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Towards improved zero-shot voice conversion with conditional dsvae

J Lian, C Zhang, GK Anumanchipalli, D Yu - arXiv preprint arXiv …, 2022 - arxiv.org

Disentangling content and speaking style information is essential for zero-shot non-parallel
voice conversion (VC). Our previous study investigated a novel framework with disentangled …

被引用次数：22 相关文章所有 7 个版本

[PDF] arxiv.org

Refxvc: Cross-lingual voice conversion with enhanced reference leveraging

M Zhang, Y Zhou, Y Ren, C Zhang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org

This paper proposes RefXVC, a method for cross-lingual voice conversion (XVC) that
leverages reference information to improve conversion performance. Previous XVC works …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群