Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and out-of-distribution robustness of deep learning models. While …
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …
Grokking, the unusual phenomenon for algorithmic datasets where generalization happens long after overfitting the training data, has remained elusive. We aim to understand grokking …
To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch …
Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with exploration by encouraging the selection of more …
Recent work by Power et al.(2022) highlighted a surprising" grokking" phenomenon in learning arithmetic tasks: a neural net first" memorizes" the training set, resulting in perfect …
W Li, Y Peng, M Zhang, L Ding, H Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of …
V Papyan - Journal of Machine Learning Research, 2020 - jmlr.org
Numerous researchers recently applied empirical spectral analysis to the study of modern deep learning classifiers. We identify and discuss an important formal class/cross-class …
In this paper we develop a new perspective on generalization of neural networks by proposing and investigating the concept of a neural network stiffness. We measure how stiff …