DISCO-10M: a large-scale music dataset

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Songcreator: Lyrics-based universal song generation

S Lei, Y Zhou, B Tang, MWY Lam, F Liu, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Music is an integral part of human culture, embodying human intelligence and creativity, of
which songs compose an essential part. While various aspects of song generation have …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

L Chen, Z Wang, S Ren, L Li, H Zhao, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Building on the foundations of language modeling in natural language processing, Next
Token Prediction (NTP) has evolved into a versatile training objective for machine learning …

Smitin: Self-monitored inference-time intervention for generative music transformers

J Koo, G Wichern, FG Germain… - IEEE Open Journal …, 2025 - ieeexplore.ieee.org

We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for
controlling an autoregressive generative music transformer using classifier probes. These …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Discogs-VI: A musical version identification dataset based on public editorial metadata

RO Araz, X Serra, D Bogdanov - arXiv preprint arXiv:2410.17400, 2024 - arxiv.org

Current version identification (VI) datasets often lack sufficient size and musical diversity to
train robust neural networks (NNs). Additionally, their non-representative clique size …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

YB Lin, Y Tian, L Yang, G Bertasius, H Wang - arXiv preprint arXiv …, 2024 - arxiv.org

We present a framework for learning to generate background music from video inputs.
Unlike existing works that rely on symbolic musical annotations, which are limited in quantity …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

S Wu, Y Wang, R Yuan, Z Guo, X Tan, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Challenges in managing linguistic diversity and integrating various musical modalities are
faced by current music information retrieval systems. These limitations reduce their …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

B Wang, L Zhuo, Z Wang, C Bao, W Chengjing… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal music generation aims to produce music from diverse input modalities, including
text, videos, and images. Existing methods use a common embedding space for multimodal …

MusicScore: A Dataset for Music Score Modeling and Generation

Y Lin, Z Dai, Q Kong - arXiv preprint arXiv:2406.11462, 2024 - arxiv.org

Music scores are written representations of music and contain rich information about musical
components. The visual information on music scores includes notes, rests, staff lines, clefs …

被引用次数：1 相关文章

[PDF] arxiv.org

SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints

H Chen, JBL Smith, J Spijkervet, JC Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Progress in the task of symbolic music generation may be lagging behind other tasks like
audio and text generation, in part because of the scarcity of symbolic training data. In this …

高级搜索

QQ 群