J Fan, C Ma, Y Zhong - Statistical science: a review journal of the …, 2021 - ncbi.nlm.nih.gov
Deep learning has achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex …
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some …
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior performance to Convolutional neural networks (CNNs) in computer vision. This empirical …
We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and …
Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of …