Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Phoneme level lyrics alignment and text-informed singing voice separation

K Schulze-Forster, CSJ Doire… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
The goal of singing voice separation is to recover the vocals signal from music mixtures.
State-of-the-art performance is achieved by deep neural networks trained in a supervised …

Deep learning approaches in topics of singing information processing

C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …

MSTRE-Net: Multistreaming acoustic modeling for automatic lyrics transcription

E Demirel, S Ahlbäck, S Dixon - arXiv preprint arXiv:2108.02625, 2021 - arxiv.org
This paper makes several contributions to automatic lyrics transcription (ALT) research. Our
main contribution is a novel variant of the Multistreaming Time-Delay Neural Network …

Phoneme-to-audio alignment with recurrent neural networks for speaking and singing voice

Y Teytaut, A Roebel - Proceedings of Interspeech 2021, 2021 - hal.science
Phoneme-to-audio alignment is the task of synchronizing voice recordings and their related
phonetic transcripts. In this work, we introduce a new system to forced phonetic alignment …

Low resource audio-to-lyrics alignment from polyphonic music recordings

E Demirel, S Ahlbäck, S Dixon - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Lyrics alignment in long music recordings can be memory exhaustive when performed in a
single pass. In this study, we present a novel method that performs audio-to-lyrics alignment …

Contrastive learning-based audio to lyrics alignment for multiple languages

S Durand, D Stoller, S Ewert - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Lyrics alignment gained considerable attention in recent years. State-of-the-art systems
either re-use established speech recognition toolkits, or design end-to-end solutions …

A study on constraining Connectionist Temporal Classification for temporal audio alignment

Y Teytaut, B Bouvier, A Roebel - Interspeech 2022, 2022 - hal.science
Connectionist Temporal Classification (CTC) has become a standard for deep learning-
based temporal alignment allowing relevant probabilistic distributions to be learned …

Improving lyrics alignment through joint pitch detection

J Huang, E Benetos, S Ewert - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
In recent years, the accuracy of automatic lyrics alignment methods has increased
considerably. Yet, many current approaches employ frameworks designed for automatic …

CTC-based learning of chroma features for score–audio music retrieval

F Zalkow, M Müller - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
This paper deals with a score–audio music retrieval task where the aim is to find relevant
audio recordings of Western classical music, given a short monophonic musical theme in …