Symbolic discovery of optimization algorithms

M Oquab, T Darcet, T Moutakanni, H Vo… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

被引用次数：1236 相关文章所有 11 个版本

[PDF] nature.com

Mathematical discoveries from program search with large language models

B Romera-Paredes, M Barekatain, A Novikov, M Balog… - Nature, 2024 - nature.com

Large language models (LLMs) have demonstrated tremendous capabilities in solving
complex tasks, from quantitative reasoning to understanding natural language. However …

被引用次数：187 相关文章所有 9 个版本

[PDF] thecvf.com

Sigmoid loss for language image pre-training

X Zhai, B Mustafa, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …

被引用次数：238 相关文章所有 5 个版本

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

被引用次数：156 相关文章所有 3 个版本

[PDF] arxiv.org

Sheared llama: Accelerating language model pre-training via structured pruning

M Xia, T Gao, Z Zeng, D Chen - arXiv preprint arXiv:2310.06694, 2023 - arxiv.org

The popularity of LLaMA (Touvron et al., 2023a; b) and other recently emerged moderate-
sized large language models (LLMs) highlights the potential of building smaller yet powerful …

被引用次数：114 相关文章所有 5 个版本

[PDF] nowpublishers.com

Automated deep learning: Neural architecture search is not the end

X Dong, DJ Kedziora, K Musial… - … and Trends® in …, 2024 - nowpublishers.com

Deep learning (DL) has proven to be a highly effective approach for developing models in
diverse contexts, including visual perception, speech recognition, and machine translation …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

被引用次数：29 相关文章所有 2 个版本

[PDF] nature.com

Automated model building and protein identification in cryo-EM maps

K Jamali, L Käll, R Zhang, A Brown, D Kimanius… - Nature, 2024 - nature.com

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high
levels of expertise and labour-intensive manual intervention in three-dimensional computer …

被引用次数：102 相关文章所有 13 个版本

[PDF] arxiv.org

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arXiv preprint arXiv:2305.14342, 2023 - arxiv.org

Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

被引用次数：87 相关文章所有 4 个版本

[PDF] neurips.cc

White-box transformers via sparse rate reduction

Y Yu, S Buchanan, D Pai, T Chu, Z Wu… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we contend that the objective of representation learning is to compress and
transform the distribution of the data, say sets of tokens, towards a mixture of low …

被引用次数：44 相关文章所有 6 个版本

高级搜索

QQ 群