Acceleration of stochastic approximation by averaging

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

被引用次数：825 相关文章所有 28 个版本

[HTML] nih.gov

[HTML][HTML] A selective overview of deep learning

J Fan, C Ma, Y Zhong - Statistical science: a review journal of the …, 2021 - ncbi.nlm.nih.gov

Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …

被引用次数：216 相关文章所有 14 个版本

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

被引用次数：2727 相关文章所有 11 个版本

[PDF] mlr.press

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

被引用次数：208 相关文章所有 6 个版本

[PDF] thecvf.com

Flatten transformer: Vision transformer using focused linear attention

D Han, X Pan, Y Han, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

The quadratic computation complexity of self-attention has been a persistent challenge
when applying Transformer models to vision tasks. Linear attention, on the other hand, offers …

被引用次数：102 相关文章所有 5 个版本

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

被引用次数：159 相关文章所有 3 个版本

[PDF] thecvf.com

Continual test-time domain adaptation

Q Wang, O Fink, L Van Gool… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Test-time domain adaptation aims to adapt a source pre-trained model to a target domain
without using any source data. Existing works mainly consider the case where the target …

被引用次数：385 相关文章所有 9 个版本

[PDF] springer.com

Visual attention network

MH Guo, CZ Lu, ZN Liu, MM Cheng, SM Hu - Computational Visual Media, 2023 - Springer

While originally designed for natural language processing tasks, the self-attention
mechanism has recently taken various computer vision areas by storm. However, the 2D …

被引用次数：624 相关文章所有 8 个版本

[PDF] thecvf.com

A convnet for the 2020s

Z Liu, H Mao, CY Wu, C Feichtenhofer… - Proceedings of the …, 2022 - openaccess.thecvf.com

The" Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …

被引用次数：5362 相关文章所有 11 个版本

[PDF] arxiv.org

Webgpt: Browser-assisted question-answering with human feedback

R Nakano, J Hilton, S Balaji, J Wu, L Ouyang… - arXiv preprint arXiv …, 2021 - arxiv.org

We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing
environment, which allows the model to search and navigate the web. By setting up the task …

被引用次数：888 相关文章所有 8 个版本

高级搜索

QQ 群