Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker...

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：141 相关文章所有 6 个版本

[PDF] ieee.org

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：370 相关文章所有 8 个版本

[PDF] sciencedirect.com

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier

In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

被引用次数：150 相关文章所有 7 个版本

[PDF] neurips.cc

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc

For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

被引用次数：23 相关文章所有 9 个版本

[PDF] arxiv.org

Cyclegan-vc3: Examining and improving cyclegan-vcs for mel-spectrogram conversion

T Kaneko, H Kameoka, K Tanaka, N Hojo - arXiv preprint arXiv …, 2020 - arxiv.org

Non-parallel voice conversion (VC) is a technique for learning mappings between source
and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial …

被引用次数：98 相关文章所有 8 个版本

[PDF] arxiv.org

Any-to-many voice conversion with location-relative sequence-to-sequence modeling

S Liu, Y Cao, D Wang, X Wu, X Liu… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org

This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq),
non-parallel voice conversion approach, which utilizes text supervision during training. In …

被引用次数：94 相关文章所有 7 个版本

[PDF] ieee.org

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

被引用次数：49 相关文章所有 7 个版本

[PDF] ieee.org

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

被引用次数：44 相关文章所有 7 个版本

[PDF] arxiv.org

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining

WC Huang, T Hayashi, YC Wu, H Kameoka… - arXiv preprint arXiv …, 2019 - arxiv.org

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based
on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models …

被引用次数：109 相关文章所有 8 个版本

[PDF] arxiv.org

Maskcyclegan-vc: Learning non-parallel voice conversion with filling in frames

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Non-parallel voice conversion (VC) is a technique for training voice converters without a
parallel corpus. Cycle-consistent adversarial network-based VCs (CycleGAN-VC and …

被引用次数：68 相关文章所有 4 个版本

高级搜索

QQ 群