WG-WaveNet: Real-time high-fidelity speech synthesis without GPU

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：464 相关文章所有 2 个版本

[PDF] nict.go.jp

Noise level limited sub-modeling for diffusion probabilistic vocoders

T Okamoto, T Toda, Y Shiga… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-
fidelity speech synthesis with a simple loss function in training, all noise components with …

被引用次数：15 相关文章所有 3 个版本

[PDF] ieee.org

Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU

K Matsubara, T Okamoto, R Takashima… - IEEE …, 2021 - ieeexplore.ieee.org

This paper investigates a real-time neural speech synthesis system on CPUs that can
synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Parallel synthesis for autoregressive speech generation

P Hsu, D Liu, AT Liu, H Lee - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Autoregressive neural vocoders have achieved outstanding performance in speech
synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder …

被引用次数：4 相关文章所有 4 个版本

[PDF] apsipa.org

Deep convolutional neural network for voice liveness detection

S Gupta, K Khoria, AT Patil… - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org

In this work, we present the system to detect the liveness by identifying the pop noise in the
voice signal in order to avoid the security breach of ASV systems. Pop noise is created due …

被引用次数：9 相关文章所有 2 个版本

Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems

E Eren, C Demiroglu - Computer Speech & Language, 2023 - Elsevier

Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …

被引用次数：4 相关文章所有 3 个版本

[HTML] intechopen.com

Generating the Voice of the Interactive Virtual Assistant

A Stan, B Lőrincz - Virtual Assistant, 2021 - books.google.com

This chapter introduces an overview of the current approaches for generating spoken
content using text-to-speech synthesis (TTS) systems, and thus the voice of an Interactive …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

被引用次数：1 相关文章所有 5 个版本

高级搜索

QQ 群