Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Automatic lyrics transcription of polyphonic music with lyrics-chord multi-task learning

X Gao, C Gupta, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Lyrics are the words that make up a song, while chords are harmonic sets of multiple notes
in music. Lyrics and chords are generally essential information in music, ie unaccompanied …

Schubert Winterreise dataset: A multimodal scenario for music analysis

C Weiß, F Zalkow, V Arifi-Müller, M Müller… - Journal on Computing …, 2021 - dl.acm.org
This article presents a multimodal dataset comprising various representations and
annotations of Franz Schubert's song cycle Winterreise. Schubert's seminal work constitutes …

Automatic lyrics alignment and transcription in polyphonic music: Does background music help?

C Gupta, E Yılmaz, H Li - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Automatic lyrics alignment and transcription in polyphonic music are challenging tasks
because the singing vocals are corrupted by the background music. In this work, we propose …

Lyricwhiz: Robust multilingual zero-shot lyrics transcription by whispering to chatgpt

L Zhuo, R Yuan, J Pan, Y Ma, Y Li, G Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription
method achieving state-of-the-art performance on various lyrics transcription datasets, even …

Transfer learning of wav2vec 2.0 for automatic lyric transcription

L Ou, X Gu, Y Wang - arXiv preprint arXiv:2207.09747, 2022 - arxiv.org
Automatic speech recognition (ASR) has progressed significantly in recent years due to the
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …

Phoneme level lyrics alignment and text-informed singing voice separation

K Schulze-Forster, CSJ Doire… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
The goal of singing voice separation is to recover the vocals signal from music mixtures.
State-of-the-art performance is achieved by deep neural networks trained in a supervised …

Genre-conditioned acoustic models for automatic lyrics transcription of polyphonic music

X Gao, C Gupta, H Li - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Lyrics transcription of polyphonic music is challenging not only because the singing vocals
are corrupted by the background music, but also because the background music and the …

Deep learning approaches in topics of singing information processing

C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …