ABSTRACT A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these …
Text-to-speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level …
In this paper we propose Flowtron: an autoregressive flow-based generative network for text- to-speech synthesis with control over speech variation and style transfer. Flowtron borrows …
Y Yasuda, X Wang, S Takaki… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
End-to-end speech synthesis is a promising approach that directly converts raw text to speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with …
Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality< text, audio> pairs for training …
Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with …
We present a novel generative model that combines state-of-the-art neural text-to-speech (TTS) with semi-supervised probabilistic latent variable models. By providing partial …
Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the …
Y Liu, R Xue, L He, X Tan, S Zhao - arXiv preprint arXiv:2207.04646, 2022 - arxiv.org
Current text to speech (TTS) systems usually leverage a cascaded acoustic model and vocoder pipeline with mel-spectrograms as the intermediate representations, which suffer …