Attention is all you need in speech separation

C Subakan, M Ravanelli, S Cornell… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Recurrent Neural Networks (RNNs) have long been the dominant architecture in sequence-
to-sequence learning. RNNs, however, are inherently sequential models that do not allow …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Two heads are better than one: A two-stage complex spectral mapping approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

Glance and gaze: A collaborative learning framework for single-channel speech enhancement

A Li, C Zheng, L Zhang, X Li - Applied Acoustics, 2022 - Elsevier
The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

TF-GridNet: Making time-frequency domain models great again for monaural speaker separation

ZQ Wang, S Cornell, S Choi, Y Lee… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

E Tzinis, Y Adi, VK Ithapu, B Xu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …

An efficient encoder-decoder architecture with top-down attention for speech separation

K Li, R Yang, X Hu - arXiv preprint arXiv:2209.15200, 2022 - arxiv.org
Deep neural networks have shown excellent prospects in speech separation tasks.
However, obtaining good results while keeping a low model complexity remains challenging …

Speech separation using an asynchronous fully recurrent convolutional neural network

X Hu, K Li, W Zhang, Y Luo… - Advances in …, 2021 - proceedings.neurips.cc
Recent advances in the design of neural network architectures, in particular those
specialized in modeling sequences, have provided significant improvements in speech …

Mossformer: Pushing the performance limit of monaural speech separation using gated single-head transformer with convolution-augmented joint self-attentions

S Zhao, B Ma - … 2023-2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
Transformer based models have provided significant performance improvements in
monaural speech separation. However, there is still a performance gap compared to a …