Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Unsupervised cross-domain singing voice conversion

A Polyak, L Wolf, Y Adi, Y Taigman - arXiv preprint arXiv:2008.02830, 2020 - arxiv.org
We present a wav-to-wav generative model for the task of singing voice conversion from any
identity. Our method utilizes both an acoustic model, trained for the task of automatic speech …

Genre-conditioned acoustic models for automatic lyrics transcription of polyphonic music

X Gao, C Gupta, H Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging not only because the singing vocals
are corrupted by the background music, but also because the background music and the …

Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics
intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …

Self-transriber: Few-shot lyrics transcription with self-training

X Gao, X Yue, H Li - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
The current lyrics transcription approaches heavily rely on supervised learning with labeled
data, but such data are scarce and manual labeling of singing is expensive. How to benefit …

Phonetic posteriorgrams based many-to-many singing voice conversion via adversarial training

H Guo, H Lu, N Hu, C Zhang, S Yang, L Xie… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper describes an end-to-end adversarial singing voice conversion (EA-SVC)
approach. It can directly generate arbitrary singing waveform by given phonetic …

Music-robust automatic lyrics transcription of polyphonic music

X Gao, C Gupta, H Li - arXiv preprint arXiv:2204.03306, 2022 - arxiv.org
Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted
by the background music. To improve the robustness of lyrics transcription to the …

Singing voice synthesis with vibrato modeling and latent energy representation

Y Song, W Song, W Zhang, Z Zhang… - 2022 IEEE 24th …, 2022 - ieeexplore.ieee.org
This paper proposes an expressive singing voice synthesis system by introducing explicit
vibrato modeling and latent energy representation. Vibrato is essential to the naturalness of …

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

R Tao, KA Lee, RK Das, V Hautamäki, H Li - arXiv preprint arXiv …, 2022 - arxiv.org
We study a novel neural architecture and its training strategies of speaker encoder for
speaker recognition without using any identity labels. The speaker encoder is trained to …

Automatic lyrics transcription of polyphonic music

X Gao - 2022 - search.proquest.com
Abstract Automatic Lyrics Transcription of polyphonic music (ALTP) aims to recognize the
sung lyrics from singing vocals in the presence of instrumental music accompaniment, and it …