The goldilocks zone: Towards better understanding of neural network loss landscapes

V Fortuin - International Statistical Review, 2022 - Wiley Online Library

While the choice of prior is one of the most critical parts of the Bayesian inference workflow,
recent Bayesian deep learning models have often fallen back on vague priors, such as …

被引用次数：154 相关文章所有 8 个版本

[PDF] arxiv.org

Deep ensembles: A loss landscape perspective

S Fort, H Hu, B Lakshminarayanan - arXiv preprint arXiv:1912.02757, 2019 - arxiv.org

Deep ensembles have been empirically shown to be a promising approach for improving
accuracy, uncertainty and out-of-distribution robustness of deep learning models. While …

被引用次数：680 相关文章所有 3 个版本

[PDF] neurips.cc

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

S Fort, GK Dziugaite, M Paul… - Advances in …, 2020 - proceedings.neurips.cc

In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …

被引用次数：195 相关文章所有 6 个版本

[PDF] openreview.net

Omnigrok: Grokking beyond algorithmic data

Z Liu, EJ Michaud, M Tegmark - The Eleventh International …, 2022 - openreview.net

Grokking, the unusual phenomenon for algorithmic datasets where generalization happens
long after overfitting the training data, has remained elusive. We aim to understand grokking …

被引用次数：97 相关文章所有 3 个版本

[PDF] mlr.press

An investigation into neural net optimization via hessian eigenvalue density

B Ghorbani, S Krishnan, Y Xiao - … Conference on Machine …, 2019 - proceedings.mlr.press

To understand the dynamics of training in deep neural networks, we study the evolution of
the Hessian eigenvalue density throughout the optimization process. In non-batch …

被引用次数：351 相关文章所有 9 个版本

[PDF] mlr.press

Understanding the impact of entropy on policy optimization

Z Ahmed, N Le Roux, M Norouzi… - … on machine learning, 2019 - proceedings.mlr.press

Entropy regularization is commonly used to improve policy optimization in reinforcement
learning. It is believed to help with exploration by encouraging the selection of more …

被引用次数：276 相关文章所有 11 个版本

[PDF] arxiv.org

Dichotomy of early and late phase implicit biases can provably induce grokking

K Lyu, J Jin, Z Li, SS Du, JD Lee, W Hu - arXiv preprint arXiv:2311.18817, 2023 - arxiv.org

Recent work by Power et al.(2022) highlighted a surprising" grokking" phenomenon in
learning arithmetic tasks: a neural net first" memorizes" the training set, resulting in perfect …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Deep model fusion: A survey

W Li, Y Peng, M Zhang, L Ding, H Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …

被引用次数：51 相关文章所有 2 个版本

[PDF] jmlr.org

Traces of class/cross-class structure pervade deep learning spectra

V Papyan - Journal of Machine Learning Research, 2020 - jmlr.org

Numerous researchers recently applied empirical spectral analysis to the study of modern
deep learning classifiers. We identify and discuss an important formal class/cross-class …

被引用次数：74 相关文章所有 5 个版本

[PDF] arxiv.org

Stiffness: A new perspective on generalization in neural networks

S Fort, PK Nowak, S Jastrzebski… - arXiv preprint arXiv …, 2019 - arxiv.org

In this paper we develop a new perspective on generalization of neural networks by
proposing and investigating the concept of a neural network stiffness. We measure how stiff …

被引用次数：89 相关文章所有 4 个版本

高级搜索

QQ 群