Adadurian: Few-shot adaptation for neural text-to-speech with durian

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：457 相关文章所有 2 个版本

[PDF] arxiv.org

Adaspeech: Adaptive text to speech for custom voice

M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao… - arXiv preprint arXiv …, 2021 - arxiv.org

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims
to adapt a source TTS model to synthesize personal voice for a target speaker using few …

被引用次数：192 相关文章所有 3 个版本

[PDF] arxiv.org

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arXiv preprint arXiv:2104.09995, 2021 - arxiv.org

As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

被引用次数：43 相关文章所有 2 个版本

[PDF] arxiv.org

Non-attentive tacotron: Robust and controllable neural tts synthesis including unsupervised duration modeling

J Shen, Y Jia, M Chrzanowski, Y Zhang, I Elias… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model,
replacing the attention mechanism with an explicit duration predictor. This improves …

被引用次数：102 相关文章所有 4 个版本

[PDF] arxiv.org

GANSpeech: Adversarial training for high-fidelity multi-speaker speech synthesis

J Yang, JS Bae, T Bak, Y Kim, HY Cho - arXiv preprint arXiv:2106.15153, 2021 - arxiv.org

Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the
generation of reasonably good speech quality with a single model and made it possible to …

被引用次数：39 相关文章所有 6 个版本

[PDF] ieee.org

Accented text-to-speech synthesis with limited data

X Zhou, M Zhang, Y Zhou, Z Wu… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

This paper presents an accented text-to-speech (TTS) synthesis framework with limited
training data. We study two aspects concerning accent rendering: phonetic (phoneme …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model

C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …

被引用次数：36 相关文章所有 7 个版本

[PDF] arxiv.org

Non-autoregressive TTS with explicit duration modelling for low-resource highly expressive speech

R Shah, K Pokora, A Ezzerg, V Klimkov… - arXiv preprint arXiv …, 2021 - arxiv.org

Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they
typically require a large amount of recordings from the target speaker. In previous work, a 3 …

被引用次数：30 相关文章所有 8 个版本

[PDF] arxiv.org

Residual adapters for few-shot text-to-speech speaker adaptation

N Morioka, H Zen, N Chen, Y Zhang, Y Ding - arXiv preprint arXiv …, 2022 - arxiv.org

Adapting a neural text-to-speech (TTS) model to a target speaker typically involves fine-
tuning most if not all of the parameters of a pretrained multi-speaker backbone model …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

被引用次数：23 相关文章所有 7 个版本

高级搜索

QQ 群