learning pretraining audio representation- 学术资源搜索

Pre-training audio representations with self-supervision

M Tagliasacchi, B Gfeller… - IEEE Signal …, 2020 - ieeexplore.ieee.org

… learning of audio representations. We posit that contextual temporal information can be
exploited in the case of general audio … (i) We propose Audio2Vec, a self-supervised learning task …

被引用次数：56 相关文章所有 5 个版本

[PDF] thecvf.com

Sound and visual representation learning with multiple pretraining tasks

AB Vasudevan, D Dai… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

… audio. In our spatial alignment SSL, we differ with them in learning binaural sounds
representation. [… task, given two video/audio frames, to learn video/audio representations. We …

被引用次数：7 相关文章所有 11 个版本

[PDF] arxiv.org

Contrastive learning of general-purpose audio representations

A Saeed, D Grangier… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

… , a self-supervised pre-training approach for learning a general-purpose representation of
audio. Our approach … We learn general-purpose audio representations from unlabeled data by …

被引用次数：258 相关文章所有 7 个版本

[PDF] arxiv.org

Pretext tasks selection for multitask self-supervised audio representation learning

S Zaiem, T Parcollet, S Essid… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

… pretraining the encoder on the English Common Voice dataset and using the learned
representations … the method to changes in the pretraining dataset, in the audio data type and in the …

被引用次数：18 相关文章所有 10 个版本

[PDF] arxiv.org

Transformer based unsupervised pre-training for acoustic representation learning

R Zhang, H Wu, W Li, D Jiang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

… pre-training method using Transformer based encoder to learn a general and robust high-level
representation … by a large amount of unlabeled audio from various kinds of datasets. After …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Q Zhu, L Zhou, Z Zhang, S Liu, B Jiao… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

… ) model for speech representation learning, with a unified pretraining object to leverage
different data sources, including paired visual-audio, audio-text, and unpaired audio and text1. …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org

… We propose learning generalpurpose audio representation from a single audio segment
with… Experimental Setup We repeated the cycle of pretraining and evaluation and averaged the …

被引用次数：152 相关文章所有 5 个版本

[PDF] arxiv.org

Audio albert: A lite bert for self-supervised learning of audio representation

PH Chi, PH Chung, TH Wu, CC Hsieh… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

… At the pre-training stage, we train our models with learning rate 5e-5, batch size 50, and
AdamW optimizer [26] for 500k steps. The models are pre-trained on a single NVIDIA Tesla …

被引用次数：169 相关文章所有 6 个版本

[PDF] arxiv.org

Multimodal self-supervised learning of general audio representations

L Wang, P Luc, A Recasens, JB Alayrac… - arXiv preprint arXiv …, 2021 - arxiv.org

… of constrastive learning of audio representations with the aid … video is not crucial to learn
strong audio representations. This … We pretrain our models on AudioSet [26] sampled at 16 kHz. …

被引用次数：45 相关文章所有 4 个版本

[PDF] ieee.org

BYOL for audio: Exploring pre-trained general-purpose audio representations

D Niizumi, D Takeuchi, Y Ohishi… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

… For pre-training, our BYOL variant framework with audio data augmentations learns a … We
adopt BYOL and learn representations invariant to input changes, relying on the changes of …

被引用次数：40 相关文章所有 5 个版本

高级搜索

QQ 群

Pre-training audio representations with self-supervision

Sound and visual representation learning with multiple pretraining tasks

Contrastive learning of general-purpose audio representations

Pretext tasks selection for multitask self-supervised audio representation learning

Transformer based unsupervised pre-training for acoustic representation learning

VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Byol for audio: Self-supervised learning for general-purpose audio representation

Audio albert: A lite bert for self-supervised learning of audio representation

Multimodal self-supervised learning of general audio representations

BYOL for audio: Exploring pre-trained general-purpose audio representations

相关搜索

引用