Priors in bayesian deep learning: A review

V Fortuin - International Statistical Review, 2022 - Wiley Online Library
While the choice of prior is one of the most critical parts of the Bayesian inference workflow,
recent Bayesian deep learning models have often fallen back on vague priors, such as …

Deep ensembles: A loss landscape perspective

S Fort, H Hu, B Lakshminarayanan - arXiv preprint arXiv:1912.02757, 2019 - arxiv.org
Deep ensembles have been empirically shown to be a promising approach for improving
accuracy, uncertainty and out-of-distribution robustness of deep learning models. While …

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

S Fort, GK Dziugaite, M Paul… - Advances in …, 2020 - proceedings.neurips.cc
In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …

Omnigrok: Grokking beyond algorithmic data

Z Liu, EJ Michaud, M Tegmark - The Eleventh International …, 2022 - openreview.net
Grokking, the unusual phenomenon for algorithmic datasets where generalization happens
long after overfitting the training data, has remained elusive. We aim to understand grokking …

An investigation into neural net optimization via hessian eigenvalue density

B Ghorbani, S Krishnan, Y Xiao - … Conference on Machine …, 2019 - proceedings.mlr.press
To understand the dynamics of training in deep neural networks, we study the evolution of
the Hessian eigenvalue density throughout the optimization process. In non-batch …

Understanding the impact of entropy on policy optimization

Z Ahmed, N Le Roux, M Norouzi… - … on machine learning, 2019 - proceedings.mlr.press
Entropy regularization is commonly used to improve policy optimization in reinforcement
learning. It is believed to help with exploration by encouraging the selection of more …

Dichotomy of early and late phase implicit biases can provably induce grokking

K Lyu, J Jin, Z Li, SS Du, JD Lee, W Hu - arXiv preprint arXiv:2311.18817, 2023 - arxiv.org
Recent work by Power et al.(2022) highlighted a surprising" grokking" phenomenon in
learning arithmetic tasks: a neural net first" memorizes" the training set, resulting in perfect …

Deep model fusion: A survey

W Li, Y Peng, M Zhang, L Ding, H Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …

Traces of class/cross-class structure pervade deep learning spectra

V Papyan - Journal of Machine Learning Research, 2020 - jmlr.org
Numerous researchers recently applied empirical spectral analysis to the study of modern
deep learning classifiers. We identify and discuss an important formal class/cross-class …

Stiffness: A new perspective on generalization in neural networks

S Fort, PK Nowak, S Jastrzebski… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper we develop a new perspective on generalization of neural networks by
proposing and investigating the concept of a neural network stiffness. We measure how stiff …