Efficient self-supervised learning with contextualized target representations for vision,...

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

被引用次数：261 相关文章所有 7 个版本

[PDF] neurips.cc

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2023 - proceedings.neurips.cc

We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

被引用次数：117 相关文章所有 6 个版本

[PDF] arxiv.org

One-peace: Exploring one general representation model toward unlimited modalities

P Wang, S Wang, J Lin, S Bai, X Zhou, J Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …

被引用次数：96 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

被引用次数：57 相关文章所有 4 个版本

[PDF] arxiv.org

Firerisk: A remote sensing dataset for fire risk assessment with benchmarks using supervised and self-supervised learning

S Shen, S Seneviratne, X Wanyan… - … Conference on Digital …, 2023 - ieeexplore.ieee.org

In recent decades, wildfires have caused tremendous property losses, fatalities, and
extensive damage to forest ecosystems. Inspired by the abundance of publicly available …

被引用次数：318 相关文章所有 8 个版本

[PDF] neurips.cc

CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders

A Fuller, K Millard, J Green - Advances in Neural …, 2024 - proceedings.neurips.cc

A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled,
spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable …

被引用次数：20 相关文章所有 6 个版本

[PDF] mdpi.com

Consequential Advancements of Self-Supervised Learning (SSL) in Deep Learning Contexts

MM Abdulrazzaq, NTA Ramaha, AA Hameed… - Mathematics, 2024 - mdpi.com

Self-supervised learning (SSL) is a potential deep learning (DL) technique that uses
massive volumes of unlabeled data to train neural networks. SSL techniques have evolved …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Av-data2vec: Self-supervised learning of audio-visual speech representations with contextualized target representations

J Lian, A Baevski, WN Hsu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Self-supervision has shown great potential for audio-visual speech recognition by vastly
reducing the amount of labeled data required to build good systems. However, existing …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

emotion2vec: Self-supervised pre-training for speech emotion representation

Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose emotion2vec, a universal speech emotion representation model. emotion2vec
is pre-trained on open-source unlabeled emotion data through self-supervised online …

被引用次数：42 相关文章所有 2 个版本

[PDF] arxiv.org

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …

被引用次数：15 相关文章所有 13 个版本

高级搜索

QQ 群