Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Clap learning audio concepts from natural language supervision

B Elizalde, S Deshmukh, M Al Ismail… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Mainstream machine listening models are trained to learn audio concepts under the
paradigm of one class label to many recordings focusing on one task. Learning under such …

Pengi: An audio language model for audio tasks

S Deshmukh, B Elizalde, R Singh… - Advances in Neural …, 2023 - proceedings.neurips.cc
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-
Supervised Learning and Zero-Shot Learning techniques. These approaches have led to …

Masked spectrogram modeling using masked autoencoders for learning general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Evaluation of Audio …, 2022 - proceedings.mlr.press
Recent general-purpose audio representations show state-of-the-art performance on
various audio tasks. These representations are pre-trained by self-supervised learning …

Whisper-at: Noise-robust automatic speech recognizers are also strong general audio event taggers

Y Gong, S Khurana, L Karlinsky, J Glass - arXiv preprint arXiv:2307.03183, 2023 - arxiv.org
In this paper, we focus on Whisper, a recent automatic speech recognition model trained
with a massive 680k hour labeled speech corpus recorded in diverse conditions. We first …

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

B Ghani, T Denton, S Kahl, H Klinck - Scientific Reports, 2023 - nature.com
Automated bioacoustic analysis aids understanding and protection of both marine and
terrestrial animals and their habitats across extensive spatiotemporal scales, and typically …

[HTML][HTML] Learning to detect an animal sound from five examples

I Nolasco, S Singh, V Morfi, V Lostanlen… - Ecological …, 2023 - Elsevier
Automatic detection and classification of animal sounds has many applications in
biodiversity monitoring and animal behavior. In the past twenty years, the volume of digitised …

BYOL for audio: Exploring pre-trained general-purpose audio representations

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Pre-trained models are essential as feature extractors in modern machine learning systems
in various domains. In this study, we hypothesize that representations effective for general …

Marble: Music audio representation benchmark for universal evaluation

R Yuan, Y Ma, Y Li, G Zhang, X Chen… - Advances in …, 2023 - proceedings.neurips.cc
In the era of extensive intersection between art and Artificial Intelligence (AI), such as image
generation and fiction co-creation, AI for music remains relatively nascent, particularly in …