When do flat minima optimizers work?

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

Temperature balancing, layer-wise weight analysis, and neural network training

Y Zhou, T Pang, K Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
Regularization in modern machine learning is crucial, and it can take various forms in
algorithmic design: training set, model family, error function, regularization terms, and …

Test accuracy vs. generalization gap: Model selection in nlp without accessing training or testing data

Y Yang, R Theisen, L Hodgkinson… - Proceedings of the 29th …, 2023 - dl.acm.org
Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …

Stochastic weight averaging revisited

H Guo, J Jin, B Liu - Applied Sciences, 2023 - mdpi.com
Averaging neural network weights sampled by a backbone stochastic gradient descent
(SGD) is a simple-yet-effective approach to assist the backbone SGD in finding better …

When are ensembles really effective?

R Theisen, H Kim, Y Yang… - Advances in …, 2024 - proceedings.neurips.cc
Ensembling has a long history in statistical data analysis, with many impactful applications.
However, in many modern machine learning settings, the benefits of ensembling are less …

Understanding robust learning through the lens of representation similarities

C Cianfarani, AN Bhagoji, V Sehwag… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Representation learning,\textit {ie} the generation of representations useful for
downstream applications, is a task of fundamental importance that underlies much of the …

Minimum norm interpolation by perceptra: Explicit regularization and implicit bias

J Park, I Pelakh, S Wojtowytsch - Advances in Neural …, 2023 - proceedings.neurips.cc
We investigate how shallow ReLU networks interpolate between known regions. Our
analysis shows that empirical risk minimizers converge to a minimum norm interpolant as …

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

Y Yang, R Theisen, L Hodgkinson, JE Gonzalez… - arXiv preprint arXiv …, 2022 - arxiv.org
Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …

A three-regime model of network pruning

Y Zhou, Y Yang, A Chang… - … on Machine Learning, 2023 - proceedings.mlr.press
Recent work has highlighted the complex influence training hyperparameters, eg, the
number of training epochs, can have on the prunability of machine learning models …

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

A Jeffares, A Curth, M van der Schaar - arXiv preprint arXiv:2411.00247, 2024 - arxiv.org
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …