Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Investigating self-supervised learning for speech enhancement and separation

Z Huang, S Watanabe, S Yang, P García… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement and separation are two fundamental tasks for robust speech
processing. Speech enhancement suppresses background noise while speech separation …

BioCPPNet: automatic bioacoustic source separation with deep neural networks

PC Bermant - Scientific Reports, 2021 - nature.com
Abstract We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a
lightweight, modular, and robust U-Net-based machine learning architecture optimized for …

Heterogeneous target speech separation

E Tzinis, G Wichern, A Subramanian… - arXiv preprint arXiv …, 2022 - arxiv.org
We introduce a new paradigm for single-channel target source separation where the
sources of interest can be distinguished using non-mutually exclusive concepts (eg …

Efficient transformer-based speech enhancement using long frames and STFT magnitudes

D de Oliveira, T Peer, T Gerkmann - arXiv preprint arXiv:2206.11703, 2022 - arxiv.org
The SepFormer architecture shows very good results in speech separation. Like other
learned-encoder models, it uses short frames, as they have been shown to obtain better …

Don't speak too fast: The impact of data bias on self-supervised speech models

Y Meng, YH Chou, AT Liu, H Lee - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Self-supervised Speech Models (S3Ms) have been proven successful in many speech
downstream tasks, like ASR. However, how pretraining data affects S3Ms' downstream …

Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings

IC Chern, KH Hung, YT Chen, T Hussain… - … , Speech, and Signal …, 2023 - ieeexplore.ieee.org
AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective
for categorical problems such as automatic speech recognition and lip-reading. This …

Improving reverberant speech separation with synthetic room impulse responses

R Aralikatti, A Ratnarajah, Z Tang… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
We present a novel approach that improves the performance of reverberant speech
separation. Our approach is based on an accurate geometric acoustic simulator (GAS) …

Embedding recurrent layers with dual-path strategy in a variant of convolutional network for speaker-independent speech separation

X Yang, C Bao - arXiv preprint arXiv:2203.13574, 2022 - arxiv.org
Speaker-independent speech separation has achieved remarkable performance in recent
years with the development of deep neural network (DNN). Various network architectures …