T Okamoto, T Toda, Y Shiga… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Although diffusion probabilistic vocoders WaveGrad and DiffWave can realize real-time high- fidelity speech synthesis with a simple loss function in training, all noise components with …
This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range …
P Hsu, D Liu, AT Liu, H Lee - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Autoregressive neural vocoders have achieved outstanding performance in speech synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder …
S Gupta, K Khoria, AT Patil… - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org
In this work, we present the system to detect the liveness by identifying the pop noise in the voice signal in order to avoid the security breach of ASV systems. Pop noise is created due …
Abstract End-to-end (e2e) speech synthesis systems have become popular with the recent introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder …
A Stan, B Lőrincz - Virtual Assistant, 2021 - books.google.com
This chapter introduces an overview of the current approaches for generating spoken content using text-to-speech synthesis (TTS) systems, and thus the voice of an Interactive …
P Ochieng - arXiv preprint arXiv:2309.09652, 2023 - arxiv.org
Diffusion based vocoders have been criticised for being slow due to the many steps required during sampling. Moreover, the model's loss function that is popularly implemented is …
T Gorai, D Saito, N Minematsu - INTERSPEECH, 2022 - isca-archive.org
This paper proposes a statistical parametric speech synthesis system that uses non- negative autoencoder (NAE) for spectral modeling. NAE is a model that extends non …