We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of …
X Gao, C Gupta, H Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the …
C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of music. Addressing the needs of real-world applications, the study of technologies related to …
X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …
X Gao, X Yue, H Li - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive. How to benefit …
J Parekh, P Rao, YH Yang - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
In this paper our goal is to convert a set of spoken lines into sung ones. Unlike previous signal processing based methods, we take a learning based approach to the problem. This …
X Gao, C Gupta, H Li - arXiv preprint arXiv:2204.03306, 2022 - arxiv.org
Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the …
This paper presents a cross-lingual voice conversion framework that adopts a modularized neural network. The modularized neural network has a common input structure that is …
DY Wu, YH Yang - arXiv preprint arXiv:2005.13835, 2020 - arxiv.org
This paper investigates the use of generative adversarial network (GAN)-based models for converting the spectrogram of a speech signal into that of a singing one, without reference to …