Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Contrastive self-supervised learning: review, progress, challenges and future research directions

P Kumar, P Rawat, S Chauhan - International Journal of Multimedia …, 2022 - Springer
In the last decade, deep supervised learning has had tremendous success. However, its
flaws, such as its dependency on manual and costly annotations on large datasets and …

Beats: Audio pre-training with acoustic tokenizers

S Chen, Y Wu, C Wang, S Liu, D Tompkins… - arXiv preprint arXiv …, 2022 - arxiv.org
The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …

Contrastive learning based self-supervised time-series analysis

J Pöppelbaum, GS Chadha, A Schwung - Applied Soft Computing, 2022 - Elsevier
Deep learning architectures usually require large scale labeled datasets for achieving good
performance on general classification tasks including computer vision and natural language …

Contrastive learning of musical representations

J Spijkervet, JA Burgoyne - arXiv preprint arXiv:2103.09410, 2021 - arxiv.org
While deep learning has enabled great advances in many areas of music, labeled music
datasets remain especially hard, expensive, and time-consuming to create. In this work, we …

Domain‐specific neural networks improve automated bird sound recognition already with small amount of local data

P Lauha, P Somervuo, P Lehikoinen… - Methods in Ecology …, 2022 - Wiley Online Library
An automatic bird sound recognition system is a useful tool for collecting data of different
bird species for ecological analysis. Together with autonomous recording units (ARUs), such …

Contrastive audio-language learning for music

I Manco, E Benetos, E Quinton, G Fazekas - arXiv preprint arXiv …, 2022 - arxiv.org
As one of the most intuitive interfaces known to humans, natural language has the potential
to mediate many tasks that involve human-computer interaction, especially in application …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Deep learning methods for abstract visual reasoning: A survey on raven's progressive matrices

M Małkiński, J Mańdziuk - ACM Computing Surveys, 2022 - dl.acm.org
Abstract visual reasoning (AVR) domain encompasses problems solving which requires the
ability to reason about relations among entities present in a given scene. While humans …

Asit: Local-global audio spectrogram vision transformer for event classification

S Atito, M Awais, W Wang… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, which were originally developed for natural language processing, have
recently generated significant interest in the computer vision and audio communities due to …