Sparse representation of phonetic features for voice conversion with and without parallel data

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：337 相关文章所有 8 个版本

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

被引用次数：134 相关文章所有 7 个版本

[PDF] arxiv.org

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2002.00198, 2020 - arxiv.org

Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

被引用次数：81 相关文章所有 7 个版本

[PDF] ieee.org

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

被引用次数：58 相关文章所有 5 个版本

[PDF] researchgate.net

Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling

Y Zhou, X Tian, H Xu, RK Das… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …

被引用次数：88 相关文章所有 5 个版本

[PDF] arxiv.org

Vaw-gan for disentanglement and recomposition of emotional elements in speech

K Zhou, B Sisman, H Li - 2021 IEEE spoken language …, 2021 - ieeexplore.ieee.org

Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to
another while preserving the linguistic content and speaker identity. In this paper, we study …

被引用次数：39 相关文章所有 4 个版本

[PDF] arxiv.org

Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training

K Zhou, B Sisman, H Li - arXiv preprint arXiv:2103.16809, 2021 - arxiv.org

Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …

被引用次数：35 相关文章所有 7 个版本

[PDF] ieee.org

Unsupervised representation disentanglement using cross domain features and adversarial learning in variational autoencoder based voice conversion

WC Huang, H Luo, HT Hwang, CC Lo… - … on Emerging Topics …, 2020 - ieeexplore.ieee.org

An effective approach for voice conversion (VC) is to disentangle linguistic content from
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …

被引用次数：49 相关文章所有 7 个版本

[PDF] academia.edu

SINGAN: Singing voice conversion with generative adversarial networks

B Sisman, K Vijayan, M Dong… - 2019 Asia-Pacific Signal …, 2019 - ieeexplore.ieee.org

Singing voice conversion (SVC) is a task to convert the source singer's voice to sound like
that of the target singer, without changing the lyrical content. So far, most of the voice …

被引用次数：45 相关文章所有 4 个版本

[PDF] a-star.edu.sg

Semantically consistent text to fashion image synthesis with an enhanced attentional generative adversarial network

KE Ak, JH Lim, JY Tham, AA Kassim - Pattern Recognition Letters, 2020 - Elsevier

Abstract Recent advancements in Generative Adversarial Networks (GANs) have led to
significant improvements in various image generation tasks including image synthesis …

被引用次数：39 相关文章所有 3 个版本

高级搜索

QQ 群