Generative adversarial networks for speech processing: A review

A Wali, Z Alamgir, S Karim, A Fawaz, MB Ali… - Computer Speech & …, 2022 - Elsevier
Generative adversarial networks (GANs) have seen remarkable progress in recent years.
They are used as generative models for all kinds of data such as text, images, audio, music …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arXiv preprint arXiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

[PDF][PDF] Speech enhancement using deep learning methods: A review

AR Yuliani, MF Amri, E Suryawati… - Jurnal Elektronika …, 2021 - pdfs.semanticscholar.org
Speech enhancement, which aims to recover the clean speech of the corrupted signal, plays
an important role in the digital speech signal processing. According to the type of …

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement

SW Fu, CF Liao, Y Tsao, SD Lin - … Conference on Machine …, 2019 - proceedings.mlr.press
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …

Speech enhancement with score-based generative models in the complex STFT domain

S Welker, J Richter, T Gerkmann - arXiv preprint arXiv:2203.17004, 2022 - arxiv.org
Score-based generative models (SGMs) have recently shown impressive results for difficult
generative tasks such as the unconditional and conditional generation of natural images …

On loss functions for supervised monaural time-domain speech enhancement

M Kolbæk, ZH Tan, SH Jensen… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Many deep learning-based speech enhancement algorithms are designed to minimize the
mean-square error (MSE) in some transform domain between a predicted and a target …

Self-supervised visual acoustic matching

A Somayazulu, C Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a
target acoustic environment. Existing methods assume access to paired training data, where …

Speech denoising in the waveform domain with self-attention

Z Kong, W Ping, A Dantrey… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.
The proposed model is based on an encoder-decoder architecture combined with several …

Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement

TA Hsieh, C Yu, SW Fu, X Lu, Y Tsao - arXiv preprint arXiv:2010.15174, 2020 - arxiv.org
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both
related to a smooth transition in speech segments that may carry linguistic information, eg …