Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

Remixit: Continual self-training of speech enhancement models via bootstrapped remixing

E Tzinis, Y Adi, VK Ithapu, B Xu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We present RemixIT, a simple yet effective self-supervised method for training speech
enhancement without the need of a single isolated in-domain speech nor a noise waveform …

[PDF][PDF] The Sound Demixing Challenge 2023-Music Demixing Track.

G Fabbro, S Uhlich, CH Lai… - Trans. Int. Soc …, 2024 - account.transactions.ismir.net
This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge
(SDX'23). We provide a summary of the challenge setup and introduce the task of robust …

Deep learning approaches in topics of singing information processing

C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …

Fostering the robustness of white-box deep neural network watermarks by neuron alignment

FQ Li, SL Wang, Y Zhu - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
The wide application of deep learning techniques is boosting the regulation of deep learning
models, especially deep neural networks (DNN), as commercial products. A necessary …

Continual self-training with bootstrapped remixing for speech enhancement

E Tzinis, Y Adi, VK Ithapu, B Xu… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We propose RemixIT, a simple and novel self-supervised training method for speech
enhancement. The proposed method is based on a continuously self-training scheme that …

Improved singing voice separation with chromagram-based pitch-aware remixing

S Yuan, Z Wang, U Isik, R Giri, JM Valin… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Singing voice separation aims to separate music into vocals and accompaniment
components. One of the major constraints for the task is the limited amount of training data …

Speech emotion recognition using semi-supervised learning with efficient labeling strategies

Z Zhu, Y Sato - 2021 IEEE Automatic Speech Recognition and …, 2021 - ieeexplore.ieee.org
The collection of large amounts of labeled data for speech emotion recognition requires
considerable time and effort. As a result, the sizes of existing corpora are limited. One …

Semi-supervised time domain target speaker extraction with attention

Z Wang, R Giri, S Venkataramani, U Isik… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we propose Exformer, a time-domain architecture for target speaker extraction. It
consists of a pre-trained speaker embedder network and a separator network based on …

Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation

W Yuan, S Wang, J Wang, M Unoki… - IEEE/ACM transactions …, 2023 - ieeexplore.ieee.org
Learning effective vocal representations from a waveform mixture is a crucial but
challenging task for deep neural network (DNN)-based singing voice separation (SVS) …