Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2023 - proceedings.neurips.cc
We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

One-peace: Exploring one general representation model toward unlimited modalities

P Wang, S Wang, J Lin, S Bai, X Zhou, J Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Firerisk: A remote sensing dataset for fire risk assessment with benchmarks using supervised and self-supervised learning

S Shen, S Seneviratne, X Wanyan… - … Conference on Digital …, 2023 - ieeexplore.ieee.org
In recent decades, wildfires have caused tremendous property losses, fatalities, and
extensive damage to forest ecosystems. Inspired by the abundance of publicly available …

CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders

A Fuller, K Millard, J Green - Advances in Neural …, 2024 - proceedings.neurips.cc
A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled,
spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable …

Consequential Advancements of Self-Supervised Learning (SSL) in Deep Learning Contexts

MM Abdulrazzaq, NTA Ramaha, AA Hameed… - Mathematics, 2024 - mdpi.com
Self-supervised learning (SSL) is a potential deep learning (DL) technique that uses
massive volumes of unlabeled data to train neural networks. SSL techniques have evolved …

Av-data2vec: Self-supervised learning of audio-visual speech representations with contextualized target representations

J Lian, A Baevski, WN Hsu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Self-supervision has shown great potential for audio-visual speech recognition by vastly
reducing the amount of labeled data required to build good systems. However, existing …

emotion2vec: Self-supervised pre-training for speech emotion representation

Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose emotion2vec, a universal speech emotion representation model. emotion2vec
is pre-trained on open-source unlabeled emotion data through self-supervised online …

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

T Parcollet, H Nguyen, S Evain, MZ Boito… - Computer Speech & …, 2024 - Elsevier
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many
different domains including computer vision and natural language processing. Speech …