Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

[PDF][PDF] Close to human quality TTS with transformer

N Li, S Liu, Y Liu, S Zhao, M Liu… - arXiv preprint arXiv …, 2018 - academia.edu
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

[PDF][PDF] DurIAN: Duration Informed Attention Network for Speech Synthesis.

C Yu, H Lu, N Hu, M Yu, C Weng, K Xu, P Liu, D Tuo… - Interspeech, 2020 - isca-archive.org
In this paper, we present a robust and effective speech synthesis system that generates
highly natural speech. The key component of proposed system is Duration Informed …

LPCNet: Improving neural speech synthesis through linear prediction

JM Valin, J Skoglund - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Neural speech synthesis models have recently demonstrated the ability to synthesize high
quality speech for text-to-speech and compression applications. These new models often …

Robutrans: A robust transformer-based text-to-speech model

N Li, Y Liu, Y Wu, S Liu, S Zhao, M Liu - Proceedings of the AAAI …, 2020 - ojs.aaai.org
Recently, neural network based speech synthesis has achieved outstanding results, by
which the synthesized audios are of excellent quality and naturalness. However, current …

Semi-supervised training for improving data efficiency in end-to-end speech synthesis

YA Chung, Y Wang, WN Hsu, Y Zhang… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent
results, they typically require a sizable set of high-quality< text, audio> pairs for training …

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arXiv preprint arXiv:2104.09995, 2021 - arxiv.org
As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

JETS: Jointly training FastSpeech2 and HiFi-GAN for end to end text to speech

D Lim, S Jung, E Kim - arXiv preprint arXiv:2203.16852, 2022 - arxiv.org
In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models
have shown synthesis quality close to human speech. For example, FastSpeech2 transforms …