An overview of voice conversion systems

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier
Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

A review on human-computer interaction and intelligent robots

F Ren, Y Bao - International Journal of Information Technology & …, 2020 - World Scientific
In the field of artificial intelligence, human–computer interaction (HCI) technology and its
related intelligent robot technologies are essential and interesting contents of research …

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier
Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

Tacotron: Towards end-to-end speech synthesis

Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu… - arXiv preprint arXiv …, 2017 - arxiv.org
A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Deep voice 3: Scaling text-to-speech with convolutional sequence learning

W Ping, K Peng, A Gibiansky, SO Arik… - arXiv preprint arXiv …, 2017 - arxiv.org
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS)
system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in …

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org
A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

[PDF][PDF] Tacotron: A fully end-to-end text-to-speech synthesis model

Y Wang, RJ Skerry-Ryan… - arXiv preprint …, 2017 - bengio.abracadoudou.com
ABSTRACT A text-to-speech synthesis system typically consists of multiple stages, such as a
text analysis frontend, an acoustic model and an audio synthesis module. Building these …

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

H Zen, H Sak - … Conference on Acoustics, Speech and Signal …, 2015 - ieeexplore.ieee.org
Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to
various speech applications including acoustic modeling for statistical parametric speech …

Synthetic speech detection through short-term and long-term prediction traces

C Borrelli, P Bestagini, F Antonacci, A Sarti… - EURASIP Journal on …, 2021 - Springer
Several methods for synthetic audio speech generation have been developed in the
literature through the years. With the great technological advances brought by deep …