Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …

Adapting pretrained speech model for mandarin lyrics transcription and alignment

JY Wang, CI Leong, YC Lin, L Su… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant
performance improvements in the past few years. However, most of the previous works only …

Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model

J Huang, E Benetos - arXiv preprint arXiv:2406.17618, 2024 - arxiv.org
Multilingual automatic lyrics transcription (ALT) is a challenging task due to the limited
availability of labelled data and the challenges introduced by singing, compared to …

Roadmap towards Superhuman Speech Understanding using Large Language Models

F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …

PolySinger: Singing-Voice to Singing-Voice Translation from English to Japanese

S Antonisen, I López-Espejo - arXiv preprint arXiv:2407.14399, 2024 - arxiv.org
The speech domain prevails in the spotlight for several natural language processing (NLP)
tasks while the singing domain remains less explored. The culmination of NLP is the speech …

Lyrics Transcription for Humans: A Readability-Aware Benchmark

O Cífka, H Schreiber, L Miner, FR Stöter - arXiv preprint arXiv:2408.06370, 2024 - arxiv.org
Writing down lyrics for human consumption involves not only accurately capturing word
sequences, but also incorporating punctuation and formatting for clarity and to convey …

A Real-Time Lyrics Alignment System Using Chroma and Phonetic Features for Classical Vocal Performance

J Park, S Yong, T Kwon, J Nam - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The goal of real-time lyrics alignment is to take live singing audio as input and to pinpoint the
exact position within given lyrics on the fly. The task can benefit real-world applications such …

Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

O Cífka, C Dimitriou, C Wang, H Schreiber… - arXiv preprint arXiv …, 2023 - arxiv.org
Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content
and ignore the finer nuances of written lyrics including formatting and punctuation, which …

MIR-MLPop: A Multilingual Pop Music Dataset with Time-Aligned Lyrics and Audio

JY Wang, CC Wang, CI Leong… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce MIR-MLPop, a publicly available multilingual pop music dataset designed for
automatic lyrics transcription and lyrics alignment in polyphonic music. The dataset …