Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arXiv preprint arXiv …, 2022 - arxiv.org
As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …

Toward universal text-to-music retrieval

SH Doh, M Won, K Choi, J Nam - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper introduces effective design choices for text-to-music retrieval systems. An ideal
text-based retrieval system would support various input queries such as pre-defined tags …

Multimodal music datasets? Challenges and future goals in music processing

AM Christodoulou, O Lartillot, AR Jensenius - International Journal of …, 2024 - Springer
The term “multimodal music dataset” is often used to describe music-related datasets that
represent music as a multimedia art form and multimodal experience. However, the term …

Multimodal representation learning over heterogeneous networks for tag-based music retrieval

ACM da Silva, DF Silva, RM Marcacini - Expert Systems with Applications, 2022 - Elsevier
Learning how to represent data represented by features obtained from multiple modalities
through representation learning strategies has received much attention in Music Information …

[PDF][PDF] Lyrics information processing: Analysis, generation, and applications

K Watanabe, M Goto - Proceedings of the 1st Workshop on NLP …, 2020 - aclanthology.org
In this paper we propose lyrics information processing (LIP) as a research field for
technologies focusing on lyrics text, which has both linguistic and musical characteristics …

On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning

T Tavares, F Ayres, Z Wang, P Smaragdis - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in audio-text cross-modal contrastive learning have shown its potential
towards zero-shot learning. One possibility for this is by projecting item embeddings from pre …

[PDF][PDF] A Chorus-Section Detection Method for Lyrics Text.

K Watanabe, M Goto - ISMIR, 2020 - staff.aist.go.jp
This paper addresses the novel task of detecting chorus sections in English and Japanese
lyrics text. Although chorus-section detection using audio signals has been studied, whether …

Artist Similarity based on Heterogeneous Graph Neural Networks

ACM da Silva, DF Silva… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Music streaming platforms rely on recommending similar artists to maintain user
engagement, with artists benefiting from these suggestions to boost their popularity. Another …

Musical Word Embedding for Music Tagging and Retrieval

SH Doh, J Lee, D Jeong, J Nam - arXiv preprint arXiv:2404.13569, 2024 - arxiv.org
Word embedding has become an essential means for text-based information retrieval.
Typically, word embeddings are learned from large quantities of general and unstructured …

Exploring modality-agnostic representations for music classification

HH Wu, M Fuentes, JP Bello - arXiv preprint arXiv:2106.01149, 2021 - arxiv.org
Music information is often conveyed or recorded across multiple data modalities including
but not limited to audio, images, text and scores. However, music information retrieval …