Deep learning in mobile and wireless networking: A survey

C Zhang, P Patras, H Haddadi - IEEE Communications surveys …, 2019 - ieeexplore.ieee.org
The rapid uptake of mobile devices and the rising popularity of mobile applications and
services pose unprecedented demands on mobile and wireless networking infrastructure …

Speech technology for healthcare: Opportunities, challenges, and state of the art

S Latif, J Qadir, A Qayyum, M Usama… - IEEE Reviews in …, 2020 - ieeexplore.ieee.org
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

Y Wang, D Stanton, Y Zhang… - International …, 2018 - proceedings.mlr.press
In this work, we propose “global style tokens”(GSTs), a bank of embeddings that are jointly
trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The …

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier
Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

Tacotron: Towards end-to-end speech synthesis

Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu… - arXiv preprint arXiv …, 2017 - arxiv.org
A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …

Char2wav: End-to-end speech synthesis

J Sotelo, S Mehri, K Kumar, JF Santos, K Kastner… - 2017 - openreview.net
We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …

Deep voice 2: Multi-speaker neural text-to-speech

A Gibiansky, S Arik, G Diamos, J Miller… - Advances in neural …, 2017 - proceedings.neurips.cc
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …

[PDF][PDF] Speaker-dependent wavenet vocoder.

A Tamamori, T Hayashi, K Kobayashi, K Takeda… - Interspeech, 2017 - isca-archive.org
In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …