Tacotron: A fully end-to-end text-to-speech synthesis model

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer

Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

被引用次数：58 相关文章所有 3 个版本

[PDF] mdpi.com

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

被引用次数：228 相关文章所有 6 个版本

[PDF] usenix.org

A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters

Y Jiang, Y Zhu, C Lan, B Yi, Y Cui, C Guo - 14th USENIX Symposium on …, 2020 - usenix.org

Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …

被引用次数：343 相关文章所有 10 个版本

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

被引用次数：1030 相关文章所有 10 个版本

[PDF] arxiv.org

Waveglow: A flow-based generative network for speech synthesis

R Prenger, R Valle, B Catanzaro - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

In this paper we propose WaveGlow: a flow-based network capable of generating high
quality speech from mel-spectrograms. WaveGlow combines insights from Glow [1] and …

被引用次数：1315 相关文章所有 6 个版本

[PDF] aaai.org

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

被引用次数：916 相关文章所有 10 个版本

[PDF] mlr.press

Efficient neural audio synthesis

N Kalchbrenner, E Elsen, K Simonyan… - International …, 2018 - proceedings.mlr.press

Sequential models achieve state-of-the-art results in audio, visual and textual domains with
respect to both estimating the data distribution and generating desired samples. Efficient …

被引用次数：1064 相关文章所有 8 个版本

[PDF] neurips.cc

Neural voice cloning with a few samples

S Arik, J Chen, K Peng, W Ping… - Advances in neural …, 2018 - proceedings.neurips.cc

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio …

被引用次数：480 相关文章所有 12 个版本

[PDF] arxiv.org

Does audio deepfake detection generalize?

NM Müller, P Czempin, F Dieckmann… - arXiv preprint arXiv …, 2022 - arxiv.org

Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake
detection a much-needed area of research. While researchers have presented various …

被引用次数：169 相关文章所有 8 个版本

[PDF] psu.edu

Highway networks

RK Srivastava, K Greff, J Schmidhuber - arXiv preprint arXiv:1505.00387, 2015 - arxiv.org

There is plenty of theoretical and empirical evidence that depth of neural networks is a
crucial ingredient for their success. However, network training becomes more difficult with …

被引用次数：2729 相关文章所有 3 个版本

高级搜索

QQ 群