Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

[HTML][HTML] Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Universal speech enhancement with score-based diffusion

J Serrà, S Pascual, J Pons, RO Araz… - arXiv preprint arXiv …, 2022 - arxiv.org
Removing background noise from speech audio has been the subject of considerable effort,
especially in recent years due to the rise of virtual communication and amateur recordings …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

Speech denoising in the waveform domain with self-attention

Z Kong, W Ping, A Dantrey… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.
The proposed model is based on an encoder-decoder architecture combined with several …

Cmgan: Conformer-based metric-gan for monaural speech enhancement

S Abdulatif, R Cao, B Yang - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
In this work, we further develop the conformer-based metric generative adversarial network
(CMGAN) model 1 for speech enhancement (SE) in the time-frequency (TF) domain. This …

A time-frequency attention module for neural speech enhancement

Q Zhang, X Qian, Z Ni, A Nicolson… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Speech enhancement plays an essential role in a wide range of speech processing
applications. Recent studies on speech enhancement tend to investigate how to effectively …

Revisiting denoising diffusion probabilistic models for speech enhancement: Condition collapse, efficiency and refinement

W Tai, F Zhou, G Trajcevski, T Zhong - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Recent literature has shown that denoising diffusion probabilistic models (DDPMs) can be
used to synthesize high-fidelity samples with a competitive (or sometimes better) quality than …