Pre-training audio representations with self-supervision

M Tagliasacchi, B Gfeller… - IEEE Signal …, 2020 - ieeexplore.ieee.org
learning of audio representations. We posit that contextual temporal information can be
exploited in the case of general audio … (i) We propose Audio2Vec, a self-supervised learning task …

Sound and visual representation learning with multiple pretraining tasks

AB Vasudevan, D Dai… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
audio. In our spatial alignment SSL, we differ with them in learning binaural sounds
representation. [… task, given two video/audio frames, to learn video/audio representations. We …

Contrastive learning of general-purpose audio representations

A Saeed, D Grangier… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
… , a self-supervised pre-training approach for learning a general-purpose representation of
audio. Our approach … We learn general-purpose audio representations from unlabeled data by …

Pretext tasks selection for multitask self-supervised audio representation learning

S Zaiem, T Parcollet, S Essid… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
pretraining the encoder on the English Common Voice dataset and using the learned
representations … the method to changes in the pretraining dataset, in the audio data type and in the …

Transformer based unsupervised pre-training for acoustic representation learning

R Zhang, H Wu, W Li, D Jiang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
pre-training method using Transformer based encoder to learn a general and robust high-level
representation … by a large amount of unlabeled audio from various kinds of datasets. After …

VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Q Zhu, L Zhou, Z Zhang, S Liu, B Jiao… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
… ) model for speech representation learning, with a unified pretraining object to leverage
different data sources, including paired visual-audio, audio-text, and unpaired audio and text1. …

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
… We propose learning generalpurpose audio representation from a single audio segment
with… Experimental Setup We repeated the cycle of pretraining and evaluation and averaged the …

Audio albert: A lite bert for self-supervised learning of audio representation

PH Chi, PH Chung, TH Wu, CC Hsieh… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
… At the pre-training stage, we train our models with learning rate 5e-5, batch size 50, and
AdamW optimizer [26] for 500k steps. The models are pre-trained on a single NVIDIA Tesla …

Multimodal self-supervised learning of general audio representations

L Wang, P Luc, A Recasens, JB Alayrac… - arXiv preprint arXiv …, 2021 - arxiv.org
… of constrastive learning of audio representations with the aid … video is not crucial to learn
strong audio representations. This … We pretrain our models on AudioSet [26] sampled at 16 kHz. …

BYOL for audio: Exploring pre-trained general-purpose audio representations

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
… For pre-training, our BYOL variant framework with audio data augmentations learns a … We
adopt BYOL and learn representations invariant to input changes, relying on the changes of …