Survey of deep learning paradigms for speech processing

KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

Attentive statistics pooling for deep speaker embedding

K Okabe, T Koshinaka, K Shinoda - arXiv preprint arXiv:1803.10963, 2018 - arxiv.org
This paper proposes attentive statistics pooling for deep speaker embedding in text-
independent speaker verification. In conventional speaker embedding, frame-level features …

Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement

X Hao, X Su, R Horaud, X Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …

Soundspaces 2.0: A simulation platform for visual-acoustic learning

C Chen, C Schissler, S Garg… - Advances in …, 2022 - proceedings.neurips.cc
Abstract We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio
rendering for 3D environments. Given a 3D mesh of a real-world environment …

Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge

A Mesaros, T Heittola, E Benetos… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Public evaluation campaigns and datasets promote active development in target research
areas, allowing direct comparison of algorithms. The second edition of the challenge on …

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arXiv preprint arXiv:2203.15149, 2022 - arxiv.org
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z Jin, A Finkelstein - arXiv preprint arXiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …