Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers...

C Zhang, P Patras, H Haddadi - IEEE Communications surveys …, 2019 - ieeexplore.ieee.org

The rapid uptake of mobile devices and the rising popularity of mobile applications and
services pose unprecedented demands on mobile and wireless networking infrastructure …

被引用次数：1938 相关文章所有 8 个版本

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org

Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

被引用次数：161 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：464 相关文章所有 2 个版本

[PDF] sigport.org

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

被引用次数：3422 相关文章所有 8 个版本

[PDF] mlr.press

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

Y Wang, D Stanton, Y Zhang… - International …, 2018 - proceedings.mlr.press

In this work, we propose “global style tokens”(GSTs), a bank of embeddings that are jointly
trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The …

被引用次数：1023 相关文章所有 7 个版本

[PDF] sciencedirect.com

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier

Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

被引用次数：429 相关文章所有 15 个版本

[PDF] isca-archive.org

Tacotron: Towards end-to-end speech synthesis

Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu… - arXiv preprint arXiv …, 2017 - arxiv.org

A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …

被引用次数：2301 相关文章所有 10 个版本

[PDF] openreview.net

Char2wav: End-to-end speech synthesis

J Sotelo, S Mehri, K Kumar, JF Santos, K Kastner… - 2017 - openreview.net

We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …

被引用次数：539 相关文章所有 4 个版本

[PDF] neurips.cc

Deep voice 2: Multi-speaker neural text-to-speech

A Gibiansky, S Arik, G Diamos, J Miller… - Advances in neural …, 2017 - proceedings.neurips.cc

We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …

被引用次数：448 相关文章所有 8 个版本

[PDF] isca-archive.org

[PDF][PDF] Speaker-dependent wavenet vocoder.

A Tamamori, T Hayashi, K Kobayashi, K Takeda… - Interspeech, 2017 - isca-archive.org

In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …

被引用次数：342 相关文章所有 5 个版本

高级搜索

QQ 群