Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

MUGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

S Liu, AS Hussain, C Sun, Y Shan - arXiv preprint arXiv:2311.11255, 2023 - arxiv.org
The current landscape of research leveraging large language models (LLMs) is
experiencing a surge. Many works harness the powerful reasoning capabilities of these …

Mmedagent: Learning to use medical tools with multi-modal agent

B Li, T Yan, Y Pan, J Luo, R Ji, J Ding, Z Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited
generality and often fall short when compared to specialized models. Recently, LLM-based …

Musicmagus: Zero-shot text-to-music editing via diffusion models

Y Zhang, Y Ikemiya, G Xia, N Murata… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in text-to-music generation models have opened new avenues in musical
creativity. However, music generation usually involves iterative refinements, and how to edit …

Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Q Deng, Q Yang, R Yuan, Y Huang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Music composition represents the creative side of humanity, and itself is a complex task that
requires abilities to understand and generate information with long dependency and …

Prompt-guided Precise Audio Editing with Diffusion Models

M Xu, C Li, D Su, W Liang, D Yu - arXiv preprint arXiv:2406.04350, 2024 - arxiv.org
Audio editing involves the arbitrary manipulation of audio content through precise control.
Although text-guided diffusion models have made significant advancements in text-to-audio …

Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning

GY Chen, VW Soo - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
The burgeoning field of text-to-music generation models has shown great promise in their
ability to generate high-quality music aligned with users' textual descriptions. These models …

Retrieval guided music captioning via multimodal prefixes

N Srivatsan, K Chen, S Dubnov… - Thirty-Third International …, 2023 - hal.science
In this paper we put forward a new approach to music captioning, the task of automatically
generating natural language descriptions for songs. These descriptions are useful both for …