UNSSOR: unsupervised neural speech separation by leveraging over-determined training mixtures

ZQ Wang, S Watanabe - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reverberant conditions with multiple concurrent speakers, each microphone acquires a
mixture signal of multiple speakers at a different location. In over-determined conditions …

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of …

Speech separation with large-scale self-supervised learning

Z Chen, N Kanda, J Wu, Y Wu, X Wang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Self-supervised learning (SSL) methods such as WavLM have shown promising speech
separation (SS) results in small-scale simulation-based experiments. In this work, we extend …

Neural speech enhancement with unsupervised pre-training and mixture training

X Hao, C Xu, L Xie - Neural Networks, 2023 - Elsevier
Supervised neural speech enhancement methods always require a large scale of paired
noisy and clean speech data. Since collecting adequate paired data from real-world …

Self-remixing: Unsupervised speech separation via separation and remixing

K Saijo, T Ogawa - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
We present Self-Remixing, a novel self-supervised speech separation method, which refines
a pre-trained separation model in an unsupervised manner. Self-Remixing consists of a …

PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

J Kalda, R Marxer, T Alumäe, H Bredin - arXiv preprint arXiv:2403.02288, 2024 - arxiv.org
A major drawback of supervised speech separation (SSep) systems is their reliance on
synthetic data, leading to poor real-world generalization. Mixture invariant training (MixIT) …

Efficient personalized speech enhancement through self-supervised learning

A Sivaraman, M Kim - IEEE Journal of Selected Topics in Signal …, 2022 - ieeexplore.ieee.org
This work presents self-supervised learning methods for monaural speaker-specific (ie,
personalized) speech enhancement models. While general-purpose models must broadly …

Unsupervised multi-channel separation and adaptation

C Han, K Wilson, S Wisdom… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
A key challenge in machine learning is to generalize from training data to an application
domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) …

Reverberation as Supervision for Speech Separation

R Aralikatti, C Boeddeker, G Wichern… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function
for single-channel reverberant speech separation. Prior methods for unsupervised …

[PDF][PDF] Using semi-supervised learning for monaural time-domain speech separation with a self-supervised learning-based si-snr estimator

S Dang, T Matsumoto, Y Takeuchi, H Kudo - Interspeech 2023., 2023 - isca-archive.org
Speech separation aims to decompose mixed speeches into independent signals. Prior
research on monaural time-domain speech separation has made great progress in …