Self-remixing: Unsupervised speech separation via separation and remixing

K Saijo, T Ogawa - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
We present Self-Remixing, a novel self-supervised speech separation method, which refines
a pre-trained separation model in an unsupervised manner. Self-Remixing consists of a …

[PDF][PDF] Sound event localization and detection with pre-trained audio spectrogram transformer and multichannel separation network

R Scheibler, T Komatsu, Y Fujita, M Hentschel - omni (1ch), 2022 - dcase.community
We propose a sound event localization and detection system based on a CNN-Conformer
base network. Our main contribution is to evaluate the use of pre-trained elements in this …

Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation …

R Hëb-Umbach, T Nakatani, M Delcroix… - IEEE Signal …, 2025 - ieeexplore.ieee.org
Multichannel acoustic signal processing is a well-established and powerful tool to exploit the
spatial diversity between a target signal and nontarget or noise sources for signal …

Microphone Array Signal Processing and Deep Learning for Speech Enhancement

R Haeb-Umbach, T Nakatani, M Delcroix… - arXiv preprint arXiv …, 2025 - arxiv.org
Multi-channel acoustic signal processing is a well-established and powerful tool to exploit
the spatial diversity between a target signal and non-target or noise sources for signal …

[PDF][PDF] A retrospective on multichannel speech and audio enhancement using machine and deep learning techniques

A dos Santos, P de Oliveira… - Proceedings of the 24th …, 2022 - researchgate.net
Speech enhancement aims to improve the perceptual quality and intelligibility of speech in
the presence of additive noise, reverberation, competing speech, background music etc …

End-to-end multi-speaker asr with independent vector analysis

R Scheibler, W Zhang, X Chang… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
We develop an end-to-end system for multi-channel, multi-speaker automatic speech
recognition. We propose a frontend for joint source separation and dereverberation based …

[PDF][PDF] 3d cnn and conformer with audio spectrogram transformer for sound event detection and localization

R Scheibler, T Komatsu, Y Fujita, M Hentschel - omni (1ch), 2022 - dcase.community
We propose a network for sound event detection and localization based on a 3D CNN for
the extraction of spatial features followed by several conformer layers. The CNN performs …