Librimix: An open-source dataset for generalizable speech separation

J Cosentino, M Pariente, S Cornell, A Deleforge… - arXiv preprint arXiv …, 2020 - arxiv.org
In recent years, wsj0-2mix has become the reference dataset for single-channel speech
separation. Most deep learning-based speech separation models today are benchmarked …

Cardiopulmonary auscultation enhancement with a two-stage noise cancellation approach

C Yang, N Dai, Z Wang, S Cai, J Wang, N Hu - … Signal Processing and …, 2023 - Elsevier
For cardiopulmonary auscultation using electronic stethoscopes, signal quality is a key
point. During signal acquisition various background sounds may be inevitably captured …

On loss functions and evaluation metrics for music source separation

E Gusó, J Pons, S Pascual… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We investigate which loss functions provide better separations via benchmarking an
extensive set of those for music source separation. To that end, we first survey the most …

LibriheavyMix: a 20,000-hour dataset for single-channel reverberant multi-talker speech separation, ASR and speaker diarization

Z Jin, Y Yang, M Shi, W Kang, X Yang, Z Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
The evolving speech processing landscape is increasingly focused on complex scenarios
like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions …

Gass: Generalizing audio source separation with large-scale data

J Pons, X Liu, S Pascual, J Serrà - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Universal source separation targets at separating the audio sources of an arbitrary mix,
removing the constraint to operate on a specific domain like speech or music. Yet, the …

[PDF][PDF] Improved Speech Enhancement Using TCN with Multiple Encoder-Decoder Layers.

V Kishore, N Tiwari, P Paramasivam - Interspeech, 2020 - interspeech2020.org
A deep learning based time domain single-channel speech enhancement technique using
multilayer encoder-decoder and a temporal convolutional network is proposed for use in …

Att-TasNet: Attending to Encodings in Time-Domain Audio Speech Separation of Noisy, Reverberant Speech Mixtures

W Ravenscroft, S Goetze, T Hain - Frontiers in Signal Processing, 2022 - frontiersin.org
Separation of speech mixtures in noisy and reverberant environments remains a
challenging task for state-of-the-art speech separation systems. Time-domain audio speech …

Quantitative evidence on overlooked aspects of enrollment speaker embeddings for target speaker separation

X Liu, X Li, J Serrà - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a
mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep …

Speaker verification based on single channel speech separation

R Jin, M Ablimit, A Hamdulla - IEEE Access, 2023 - ieeexplore.ieee.org
In multi-speaker scenarios, speech processing tasks like speaker identification and speech
recognition are susceptible to noise and overlapped voices. As the overlapped voices are a …

PodcastMix: A dataset for separating music and speech in podcasts

N Schmidt, J Pons, M Miron - arXiv preprint arXiv:2207.07403, 2022 - arxiv.org
We introduce PodcastMix, a dataset formalizing the task of separating background music
and foreground speech in podcasts. We aim at defining a benchmark suitable for training …