Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Audio self-supervised learning: A survey

S Liu, A Mallol-Ragolta, E Parada-Cabaleiro, K Qian… - Patterns, 2022 - cell.com
Similar to humans' cognitive ability to generalize knowledge and skills, self-supervised
learning (SSL) targets discovering general representations from large-scale data. This …

Beats: Audio pre-training with acoustic tokenizers

S Chen, Y Wu, C Wang, S Liu, D Tompkins… - arXiv preprint arXiv …, 2022 - arxiv.org
The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …

Tera: Self-supervised learning of transformer encoder representation for speech

AT Liu, SW Li, H Lee - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce a self-supervised speech pre-training method called TERA, which stands for
Transformer Encoder Representations from Alteration. Recent approaches often learn by …

Contrastive learning of general-purpose audio representations

A Saeed, D Grangier… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We introduce COLA, a self-supervised pre-training approach for learning a general-purpose
representation of audio. Our approach is based on contrastive learning: it learns a …

Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation

Y Gong, YA Chung, J Glass - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
Audio tagging is an active research area and has a wide range of applications. Since the
release of AudioSet, great progress has been made in advancing model performance, which …

Contrastive learning of musical representations

J Spijkervet, JA Burgoyne - arXiv preprint arXiv:2103.09410, 2021 - arxiv.org
While deep learning has enabled great advances in many areas of music, labeled music
datasets remain especially hard, expensive, and time-consuming to create. In this work, we …

Towards learning universal audio representations

L Wang, P Luc, Y Wu, A Recasens… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The ability to learn universal audio representations that can solve diverse speech, music,
and environment tasks can spur many applications that require general sound content …

Unsupervised contrastive learning of sound event representations

E Fonseca, D Ortego, K McGuinness… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Self-supervised representation learning can mitigate the limitations in recognition tasks with
few manually labeled data but abundant unlabeled data—a common scenario in sound …

Supervised and unsupervised learning of audio representations for music understanding

MC McCallum, F Korzeniowski, S Oramas… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we provide a broad comparative analysis of strategies for pre-training audio
understanding models for several tasks in the music domain, including labelling of genre …