GPU-accelerated guided source separation for meeting transcription

D Raj, D Povey, S Khudanpur - arXiv preprint arXiv:2212.05271, 2022 - arxiv.org
Guided source separation (GSS) is a type of target-speaker extraction method that relies on
pre-computed speaker activities and blind source separation to perform front-end …

Self-remixing: Unsupervised speech separation via separation and remixing

K Saijo, T Ogawa - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
We present Self-Remixing, a novel self-supervised speech separation method, which refines
a pre-trained separation model in an unsupervised manner. Self-Remixing consists of a …

Unsupervised multi-channel separation and adaptation

C Han, K Wilson, S Wisdom… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
A key challenge in machine learning is to generalize from training data to an application
domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) …

Location as supervision for weakly supervised multi-channel source separation of machine sounds

R Falcon-Perez, G Wichern… - 2023 IEEE Workshop …, 2023 - ieeexplore.ieee.org
In this work, we are interested in learning a model to separate sources that cannot be
recorded in isolation, such as parts of a machine that must run simultaneously in order for …

[PDF][PDF] Sound event localization and detection with pre-trained audio spectrogram transformer and multichannel separation network

R Scheibler, T Komatsu, Y Fujita, M Hentschel - omni (1ch), 2022 - dcase.community
We propose a sound event localization and detection system based on a CNN-Conformer
base network. Our main contribution is to evaluate the use of pre-trained elements in this …

Multi-resolution location-based training for multi-channel continuous speech separation

H Taherian, DL Wang - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
The performance of automatic speech recognition (ASR) systems severely degrades when
multi-talker speech overlap occurs. In meeting environments, speech separation is typically …

Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

Y Bando, Y Masuyama, AA Nugraha… - 2023 31st European …, 2023 - ieeexplore.ieee.org
This paper describes an efficient unsupervised learning method for a neural source
separation model that utilizes a probabilistic generative model of observed multichannel …

Joint separation and localization of moving sound sources based on neural full-rank spatial covariance analysis

H Munakata, Y Bando, R Takeda… - IEEE Signal …, 2023 - ieeexplore.ieee.org
This paper presents an unsupervised multichannel method that can separate moving sound
sources based on an amortized variational inference (AVI) of joint separation and …

Enhanced reverberation as supervision for unsupervised speech separation

K Saijo, G Wichern, FG Germain, Z Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Reverberation as supervision (RAS) is a framework that allows for training monaural speech
separation models from multi-channel mixtures in an unsupervised manner. In RAS, models …

So-DAS: A Two-Step Soft-Direction-Aware Speech Separation Framework

Y Yang, Q Hu, Q Zhao, P Zhang - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Most existing direction-aware speech separation systems lead to performance degradation
when the angle difference between speakers is small due to the low spatial discrimination …