Vocaine the vocoder and applications in speech synthesis

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier

Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

被引用次数：352 相关文章所有 6 个版本

[PDF] worldscientific.com

A review on human-computer interaction and intelligent robots

F Ren, Y Bao - International Journal of Information Technology & …, 2020 - World Scientific

In the field of artificial intelligence, human–computer interaction (HCI) technology and its
related intelligent robot technologies are essential and interesting contents of research …

被引用次数：155 相关文章所有 10 个版本

[PDF] sciencedirect.com

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier

Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

被引用次数：427 相关文章所有 15 个版本

[PDF] isca-archive.org

Tacotron: Towards end-to-end speech synthesis

Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu… - arXiv preprint arXiv …, 2017 - arxiv.org

A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …

被引用次数：2295 相关文章所有 10 个版本

[PDF] academia.edu

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

被引用次数：5945 相关文章所有 10 个版本

[PDF] arxiv.org

Deep voice 3: Scaling text-to-speech with convolutional sequence learning

W Ping, K Peng, A Gibiansky, SO Arik… - arXiv preprint arXiv …, 2017 - arxiv.org

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS)
system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in …

被引用次数：563 相关文章所有 5 个版本

[PDF] jst.go.jp

World: a vocoder-based high-quality speech synthesis system for real-time applications

M Morise, F Yokomori, K Ozawa - IEICE TRANSACTIONS on …, 2016 - search.ieice.org

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to
improve the sound quality of real-time applications using speech. Speech analysis …

被引用次数：1535 相关文章所有 11 个版本

[PDF] abracadoudou.com

[PDF][PDF] Tacotron: A fully end-to-end text-to-speech synthesis model

Y Wang, RJ Skerry-Ryan… - arXiv preprint …, 2017 - bengio.abracadoudou.com

ABSTRACT A text-to-speech synthesis system typically consists of multiple stages, such as a
text analysis frontend, an acoustic model and an audio synthesis module. Building these …

被引用次数：294 相关文章所有 3 个版本

[PDF] audentia-gestion.fr

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

H Zen, H Sak - … Conference on Acoustics, Speech and Signal …, 2015 - ieeexplore.ieee.org

Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to
various speech applications including acoustic modeling for statistical parametric speech …

被引用次数：394 相关文章所有 12 个版本

[PDF] springer.com

Synthetic speech detection through short-term and long-term prediction traces

C Borrelli, P Bestagini, F Antonacci, A Sarti… - EURASIP Journal on …, 2021 - Springer

Several methods for synthetic audio speech generation have been developed in the
literature through the years. With the great technological advances brought by deep …

被引用次数：86 相关文章所有 9 个版本

高级搜索

QQ 群