A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arXiv preprint arXiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement

Y Hu, Y Liu, S Lv, M Xing, S Zhang, Y Fu, J Wu… - arXiv preprint arXiv …, 2020 - arxiv.org
Speech enhancement has benefited from the success of deep learning in terms of
intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods …

Icassp 2023 deep noise suppression challenge

H Dubey, A Aazami, V Gopal, B Naderi… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the
DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arXiv preprint arXiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …

NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets

G Mittag, B Naderi, A Chehadi, S Möller - arXiv preprint arXiv:2104.09494, 2021 - arxiv.org
In this paper, we present an update to the NISQA speech quality prediction model that is
focused on distortions that occur in communication networks. In contrast to the previous …

Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement

X Hao, X Su, R Horaud, X Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …