High quality, lightweight and adaptable TTS using LPCNet

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：458 相关文章所有 2 个版本

[PDF] arxiv.org

Adaspeech: Adaptive text to speech for custom voice

M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao… - arXiv preprint arXiv …, 2021 - arxiv.org

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims
to adapt a source TTS model to synthesize personal voice for a target speaker using few …

被引用次数：194 相关文章所有 3 个版本

[PDF] arxiv.org

Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings

E Cooper, CI Lai, Y Yasuda, F Fang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can
produce good speaker similarity for speakers seen during training, there remains a gap for …

被引用次数：218 相关文章所有 7 个版本

[PDF] arxiv.org

Attentron: Few-shot text-to-speech utilizing attention-based variable-length embedding

S Choi, S Han, D Kim, S Ha - arXiv preprint arXiv:2005.08484, 2020 - arxiv.org

On account of growing demands for personalization, the need for a so-called few-shot TTS
system that clones speakers with only a few data is emerging. To address this issue, we …

被引用次数：80 相关文章所有 9 个版本

[PDF] arxiv.org

Leveraging unpaired text data for training end-to-end speech-to-intent systems

Y Huang, HK Kuo, S Thomas, Z Kons… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Training an end-to-end (E2E) neural network speech-to-intent (S2I) system that directly
extracts intents from speech requires large amounts of intent-labeled speech data, which is …

被引用次数：75 相关文章所有 4 个版本

[PDF] arxiv.org

Audio Anti-Spoofing Detection: A Survey

M Li, Y Ahmadiadli, XP Zhang - arXiv preprint arXiv:2404.13914, 2024 - arxiv.org

The availability of smart devices leads to an exponential increase in multimedia content.
However, the rapid advancements in deep learning have given rise to sophisticated …

被引用次数：17 相关文章所有 2 个版本

[PDF] openreview.net

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji… - The Twelfth …, 2024 - openreview.net

Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts,
which significantly reduces the data and computation requirements for voice cloning by …

被引用次数：25 相关文章

[PDF] arxiv.org

GANSpeech: Adversarial training for high-fidelity multi-speaker speech synthesis

J Yang, JS Bae, T Bak, Y Kim, HY Cho - arXiv preprint arXiv:2106.15153, 2021 - arxiv.org

Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the
generation of reasonably good speech quality with a single model and made it possible to …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

nnspeech: Speaker-guided conditional variational autoencoder for zero-shot multi-speaker text-to-speech

B Zhao, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Multi-speaker text-to-speech (TTS) using a few adaption data is a challenge in practical
applications. To address that, we propose a zero-shot multi-speaker TTS, named nnSpeech …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

被引用次数：24 相关文章所有 7 个版本

高级搜索

QQ 群