Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Q Deng, Q Yang, R Yuan, Y Huang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Music composition represents the creative side of humanity, and itself is a complex task that
requires abilities to understand and generate information with long dependency and …

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

H Que, J Liu, G Zhang, C Zhang, X Qu, Y Ma… - arXiv preprint arXiv …, 2024 - arxiv.org
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to
expand the model's fundamental understanding of specific downstream domains (eg, math …

Exploring Tokenization Methods for Multitrack Sheet Music Generation

Y Wang, S Wu, X Du, M Sun - arXiv preprint arXiv:2410.17584, 2024 - arxiv.org
This study explores the tokenization of multitrack sheet music in ABC notation, introducing
two methods--bar-stream and line-stream patching. We compare these methods against …