Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org
This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

Waveglow: A flow-based generative network for speech synthesis

R Prenger, R Valle, B Catanzaro - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
In this paper we propose WaveGlow: a flow-based network capable of generating high
quality speech from mel-spectrograms. WaveGlow combines insights from Glow [1] and …

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

Efficient neural audio synthesis

N Kalchbrenner, E Elsen, K Simonyan… - International …, 2018 - proceedings.mlr.press
Sequential models achieve state-of-the-art results in audio, visual and textual domains with
respect to both estimating the data distribution and generating desired samples. Efficient …

Neural voice cloning with a few samples

S Arik, J Chen, K Peng, W Ping… - Advances in neural …, 2018 - proceedings.neurips.cc
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio …

Does audio deepfake detection generalize?

NM Müller, P Czempin, F Dieckmann… - arXiv preprint arXiv …, 2022 - arxiv.org
Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake
detection a much-needed area of research. While researchers have presented various …

Highway networks

RK Srivastava, K Greff, J Schmidhuber - arXiv preprint arXiv:1505.00387, 2015 - arxiv.org
There is plenty of theoretical and empirical evidence that depth of neural networks is a
crucial ingredient for their success. However, network training becomes more difficult with …