A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Noise level limited sub-modeling for diffusion probabilistic vocoders

T Okamoto, T Toda, Y Shiga… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high-
fidelity speech synthesis with a simple loss function in training, all noise components with …

Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU

K Matsubara, T Okamoto, R Takashima… - IEEE …, 2021 - ieeexplore.ieee.org
This paper investigates a real-time neural speech synthesis system on CPUs that can
synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …

Parallel synthesis for autoregressive speech generation

P Hsu, D Liu, AT Liu, H Lee - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Autoregressive neural vocoders have achieved outstanding performance in speech
synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder …

Deep convolutional neural network for voice liveness detection

S Gupta, K Khoria, AT Patil… - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org
In this work, we present the system to detect the liveness by identifying the pop noise in the
voice signal in order to avoid the security breach of ASV systems. Pop noise is created due …

Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems

E Eren, C Demiroglu - Computer Speech & Language, 2023 - Elsevier
Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent
introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …

Generating the Voice of the Interactive Virtual Assistant

A Stan, B Lőrincz - Virtual Assistant, 2021 - books.google.com
This chapter introduces an overview of the current approaches for generating spoken
content using text-to-speech synthesis (TTS) systems, and thus the voice of an Interactive …

Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer

P Ochieng - arXiv preprint arXiv:2309.09652, 2023 - arxiv.org
Diffusion based vocoders have been criticised for being slow due to the many steps required
during sampling. Moreover, the model's loss function that is popularly implemented is …

[PDF][PDF] 高效率語音生成: 運算效率, 資料效率及其在語音自監督學習中的應用

許博竣 - 臺灣大學電信工程學研究所學位論文, 2024 - tdr.lib.ntu.edu.tw
摘要近年來, 隨著深度學習的進步, 許多語音生成模型展現了出色的表現.
儘管取得了亮眼的成果, 語音生成技術的發展也伴隨了對運算和資料資源的更大需求 …

[PDF][PDF] Text-to-speech synthesis using spectral modeling based on non-negative autoencoder.

T Gorai, D Saito, N Minematsu - INTERSPEECH, 2022 - isca-archive.org
This paper proposes a statistical parametric speech synthesis system that uses non-
negative autoencoder (NAE) for spectral modeling. NAE is a model that extends non …