Implicit bias of gradient descent on linear convolutional networks

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

被引用次数：74 相关文章所有 5 个版本

[HTML] nih.gov

[HTML][HTML] A selective overview of deep learning

J Fan, C Ma, Y Zhong - Statistical science: a review journal of the …, 2021 - ncbi.nlm.nih.gov

Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …

被引用次数：216 相关文章所有 14 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3758 相关文章所有 2 个版本

[PDF] neurips.cc

Reconstructing training data from trained neural networks

N Haim, G Vardi, G Yehudai… - Advances in Neural …, 2022 - proceedings.neurips.cc

Understanding to what extent neural networks memorize training data is an intriguing
question with practical and theoretical implications. In this paper we show that in some …

被引用次数：135 相关文章所有 6 个版本

[PDF] cambridge.org

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：326 相关文章所有 12 个版本

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

被引用次数：101 相关文章所有 7 个版本

[PDF] neurips.cc

Vision transformers provably learn spatial structure

S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …

被引用次数：68 相关文章所有 6 个版本

[PDF] neurips.cc

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc

We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

被引用次数：273 相关文章所有 7 个版本

[PDF] mlr.press

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press

Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

被引用次数：1039 相关文章所有 6 个版本

[HTML] nih.gov

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov

Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

被引用次数：862 相关文章所有 16 个版本

高级搜索

QQ 群