Loop copilot: Conducting ai ensembles for music generation and iterative editing

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

MUGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

S Liu, AS Hussain, C Sun, Y Shan - arXiv preprint arXiv:2311.11255, 2023 - arxiv.org

The current landscape of research leveraging large language models (LLMs) is
experiencing a surge. Many works harness the powerful reasoning capabilities of these …

被引用次数：24 相关文章

[PDF] arxiv.org

Mmedagent: Learning to use medical tools with multi-modal agent

B Li, T Yan, Y Pan, J Luo, R Ji, J Ding, Z Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited
generality and often fall short when compared to specialized models. Recently, LLM-based …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Musicmagus: Zero-shot text-to-music editing via diffusion models

Y Zhang, Y Ikemiya, G Xia, N Murata… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advances in text-to-music generation models have opened new avenues in musical
creativity. However, music generation usually involves iterative refinements, and how to edit …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org

Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Q Deng, Q Yang, R Yuan, Y Huang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Music composition represents the creative side of humanity, and itself is a complex task that
requires abilities to understand and generate information with long dependency and …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Prompt-guided Precise Audio Editing with Diffusion Models

M Xu, C Li, D Su, W Liang, D Yu - arXiv preprint arXiv:2406.04350, 2024 - arxiv.org

Audio editing involves the arbitrary manipulation of audio content through precise control.
Although text-guided diffusion models have made significant advancements in text-to-audio …

被引用次数：1 相关文章所有 3 个版本

[PDF] openreview.net

Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning

GY Chen, VW Soo - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

The burgeoning field of text-to-music generation models has shown great promise in their
ability to generate high-quality music aligned with users' textual descriptions. These models …

Retrieval guided music captioning via multimodal prefixes

N Srivatsan, K Chen, S Dubnov… - Thirty-Third International …, 2023 - hal.science

In this paper we put forward a new approach to music captioning, the task of automatically
generating natural language descriptions for songs. These descriptions are useful both for …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群