Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Mathematical discoveries from program search with large language models

B Romera-Paredes, M Barekatain, A Novikov, M Balog… - Nature, 2024 - nature.com
Large language models (LLMs) have demonstrated tremendous capabilities in solving
complex tasks, from quantitative reasoning to understanding natural language. However …

Sigmoid loss for language image pre-training

X Zhai, B Mustafa, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

Sheared llama: Accelerating language model pre-training via structured pruning

M Xia, T Gao, Z Zeng, D Chen - arXiv preprint arXiv:2310.06694, 2023 - arxiv.org
The popularity of LLaMA (Touvron et al., 2023a; b) and other recently emerged moderate-
sized large language models (LLMs) highlights the potential of building smaller yet powerful …

Automated deep learning: Neural architecture search is not the end

X Dong, DJ Kedziora, K Musial… - … and Trends® in …, 2024 - nowpublishers.com
Deep learning (DL) has proven to be a highly effective approach for developing models in
diverse contexts, including visual perception, speech recognition, and machine translation …

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Automated model building and protein identification in cryo-EM maps

K Jamali, L Käll, R Zhang, A Brown, D Kimanius… - Nature, 2024 - nature.com
Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high
levels of expertise and labour-intensive manual intervention in three-dimensional computer …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

White-box transformers via sparse rate reduction

Y Yu, S Buchanan, D Pai, T Chu, Z Wu… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we contend that the objective of representation learning is to compress and
transform the distribution of the data, say sets of tokens, towards a mixture of low …