Close to human quality TTS with transformer

Y Ren, Y Ruan, X Tan, T Qin, S Zhao… - Advances in neural …, 2019 - proceedings.neurips.cc

Neural network based end-to-end text to speech (TTS) has significantly improved the quality
of synthesized speech. Prominent methods (eg, Tacotron 2) usually first generate mel …

被引用次数：1148 相关文章所有 10 个版本

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

被引用次数：854 相关文章所有 10 个版本

[PDF] arxiv.org

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

被引用次数：228 相关文章所有 7 个版本

[PDF] arxiv.org

Learning latent representations for style control and transfer in end-to-end speech synthesis

YJ Zhang, S Pan, L He, ZH Ling - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech
synthesis model, to learn the latent representation of speaking styles in an unsupervised …

被引用次数：287 相关文章所有 4 个版本

[PDF] isca-archive.org

[PDF][PDF] DurIAN: Duration Informed Attention Network for Speech Synthesis.

C Yu, H Lu, N Hu, M Yu, C Weng, K Xu, P Liu, D Tuo… - Interspeech, 2020 - isca-archive.org

In this paper, we present a robust and effective speech synthesis system that generates
highly natural speech. The key component of proposed system is Duration Informed …

被引用次数：108 相关文章所有 6 个版本

Flow-TTS: A non-autoregressive network for text to speech based on flow

C Miao, S Liang, M Chen, J Ma… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model
based on generative flow. Unlike other non-autoregressive models, Flow-TTS can achieve …

被引用次数：134 相关文章所有 3 个版本

[PDF] arxiv.org

The zero resource speech challenge 2019: TTS without T

E Dunbar, R Algayres, J Karadayi, M Bernard… - arXiv preprint arXiv …, 2019 - arxiv.org

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech
synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without …

被引用次数：132 相关文章所有 15 个版本

[PDF] ieee.org

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

被引用次数：42 相关文章所有 7 个版本

[PDF] mlr.press

Almost unsupervised text to speech and automatic speech recognition

Y Ren, X Tan, T Qin, S Zhao… - … on machine learning, 2019 - proceedings.mlr.press

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech
processing and both achieve impressive performance thanks to the recent advance in deep …

被引用次数：122 相关文章所有 7 个版本

[PDF] arxiv.org

Durian: Duration informed attention network for multimodal synthesis

C Yu, H Lu, N Hu, M Yu, C Weng, K Xu, P Liu… - arXiv preprint arXiv …, 2019 - arxiv.org

In this paper, we present a generic and robust multimodal synthesis system that produces
highly natural speech and facial expression simultaneously. The key component of this …

被引用次数：102 相关文章所有 2 个版本

高级搜索

QQ 群