Taxonomizing local versus global structure in neural network loss landscapes

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

被引用次数：76 相关文章所有 7 个版本

[PDF] neurips.cc

Temperature balancing, layer-wise weight analysis, and neural network training

Y Zhou, T Pang, K Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Regularization in modern machine learning is crucial, and it can take various forms in
algorithmic design: training set, model family, error function, regularization terms, and …

被引用次数：11 相关文章所有 6 个版本

[PDF] acm.org

Test accuracy vs. generalization gap: Model selection in nlp without accessing training or testing data

Y Yang, R Theisen, L Hodgkinson… - Proceedings of the 29th …, 2023 - dl.acm.org

Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …

被引用次数：16 相关文章所有 2 个版本

[PDF] mdpi.com

Stochastic weight averaging revisited

H Guo, J Jin, B Liu - Applied Sciences, 2023 - mdpi.com

Averaging neural network weights sampled by a backbone stochastic gradient descent
(SGD) is a simple-yet-effective approach to assist the backbone SGD in finding better …

被引用次数：33 相关文章所有 8 个版本

[PDF] neurips.cc

When are ensembles really effective?

R Theisen, H Kim, Y Yang… - Advances in …, 2024 - proceedings.neurips.cc

Ensembling has a long history in statistical data analysis, with many impactful applications.
However, in many modern machine learning settings, the benefits of ensembling are less …

被引用次数：15 相关文章所有 5 个版本

[PDF] neurips.cc

Understanding robust learning through the lens of representation similarities

C Cianfarani, AN Bhagoji, V Sehwag… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Representation learning,\textit {ie} the generation of representations useful for
downstream applications, is a task of fundamental importance that underlies much of the …

被引用次数：12 相关文章所有 10 个版本

[PDF] neurips.cc

Minimum norm interpolation by perceptra: Explicit regularization and implicit bias

J Park, I Pelakh, S Wojtowytsch - Advances in Neural …, 2023 - proceedings.neurips.cc

We investigate how shallow ReLU networks interpolate between known regions. Our
analysis shows that empirical risk minimizers converge to a minimum norm interpolant as …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

Y Yang, R Theisen, L Hodgkinson, JE Gonzalez… - arXiv preprint arXiv …, 2022 - arxiv.org

Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …

被引用次数：18 相关文章所有 4 个版本

[PDF] mlr.press

A three-regime model of network pruning

Y Zhou, Y Yang, A Chang… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent work has highlighted the complex influence training hyperparameters, eg, the
number of training epochs, can have on the prunability of machine learning models …

被引用次数：7 相关文章所有 8 个版本

[PDF] arxiv.org

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

A Jeffares, A Curth, M van der Schaar - arXiv preprint arXiv:2411.00247, 2024 - arxiv.org

Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群