Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Musecoco: Generating symbolic music from text

P Lu, X Xu, C Kang, B Yu, C Xing, X Tan… - arXiv preprint arXiv …, 2023 - arxiv.org
Generating music from text descriptions is a user-friendly mode since the text is a relatively
easy interface for user engagement. While some approaches utilize texts to control music …

Chatmusician: Understanding and generating music intrinsically with llm

R Yuan, H Lin, Y Wang, Z Tian, S Wu, T Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
While Large Language Models (LLMs) demonstrate impressive capabilities in text
generation, we find that their ability has yet to be generalized to music, humanity's creative …

Clamp: Contrastive language-music pre-training for cross-modal symbolic music information retrieval

S Wu, D Yu, X Tan, M Sun - arXiv preprint arXiv:2304.11029, 2023 - arxiv.org
We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal
representations between natural language and symbolic music using a music encoder and …

Mupt: A generative symbolic music pretrained transformer

X Qu, Y Bai, Y Ma, Z Zhou, KM Lo, J Liu, R Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we explore the application of Large Language Models (LLMs) to the pre-
training of music. While the prevalent use of MIDI in music modeling is well-established, our …

PE-GPT: A New Paradigm for Power Electronics Design

F Lin, X Li, W Lei, JJ Rodriguez-Andina… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) have shown exciting potential in powering the growth of
many industries, yet their adoption in the power electronics (PE) sector is hindered by a lack …

MMD-MII model: a multilayered analysis and multimodal integration interaction approach revolutionizing music emotion classification

J Wang, A Sharifi, TR Gadekallu, A Shankar - International Journal of …, 2024 - Springer
Music plays a vital role in human culture and society, serving as a universal form of
expression. However, accurately classifying music emotions remains challenging due to the …

MtArtGPT: A multi-task art generation system with pre-trained transformer

C Jin, R Zhu, Z Zhu, L Yang, M Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Instruction tuning large language models are making rapid advances in the field of artificial
intelligence where GPT-4 models have exhibited impressive multi-modal perception …

Generating symbolic music from natural language prompts using an llm-enhanced dataset

W Xu, J McAuley, T Berg-Kirkpatrick, S Dubnov… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent years have seen many audio-domain text-to-music generation models that rely on
large amounts of text-audio pairs for training. However, symbolic-domain controllable music …