相关文章- 学术资源搜索

Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization

K Wen, Z Li, T Ma - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Despite extensive studies, the underlying reason as to why overparameterizedneural
networks can generalize remains elusive. Existing theory shows that common stochastic …

被引用次数：16 相关文章所有 6 个版本

[PDF] mlr.press

Double trouble in double descent: Bias and variance (s) in the lazy regime

S d'Ascoli, M Refinetti, G Biroli… - … on Machine Learning, 2020 - proceedings.mlr.press

Deep neural networks can achieve remarkable generalization performances while
interpolating the training data. Rather than the U-curve emblematic of the bias-variance …

被引用次数：157 相关文章所有 6 个版本

[PDF] mlr.press

On the proof of global convergence of gradient descent for deep relu networks with linear widths

Q Nguyen - International Conference on Machine Learning, 2021 - proceedings.mlr.press

We give a simple proof for the global convergence of gradient descent in training deep
ReLU networks with the standard square loss, and show some of its improvements over the …

被引用次数：52 相关文章所有 6 个版本

[PDF] jmlr.org

What causes the test error? going beyond bias-variance via anova

L Lin, E Dobriban - Journal of Machine Learning Research, 2021 - jmlr.org

Modern machine learning methods are often overparametrized, allowing adaptation to the
data at a fine level. This can seem puzzling; in the worst case, such models do not need to …

被引用次数：46 相关文章所有 8 个版本

[PDF] mlr.press

Kernel and rich regimes in overparametrized models

B Woodworth, S Gunasekar, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

A recent line of work studies overparametrized neural networks in the “kernel regime,” ie
when during training the network behaves as a kernelized linear predictor, and thus, training …

被引用次数：352 相关文章所有 11 个版本

[PDF] mlr.press

Rethinking bias-variance trade-off for generalization of neural networks

Z Yang, Y Yu, C You, J Steinhardt… - … on Machine Learning, 2020 - proceedings.mlr.press

The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …

被引用次数：203 相关文章所有 6 个版本

[PDF] arxiv.org

Training of deep neural networks based on distance measures using RMSProp

T Kurbiel, S Khaleghian - arXiv preprint arXiv:1708.01911, 2017 - arxiv.org

The vanishing gradient problem was a major obstacle for the success of deep learning. In
recent years it was gradually alleviated through multiple different techniques. However the …

被引用次数：87 相关文章所有 2 个版本

[PDF] jmlr.org

Convex geometry and duality of over-parameterized neural networks

T Ergen, M Pilanci - Journal of machine learning research, 2021 - jmlr.org

We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We first prove that an optimal solution to the regularized training problem can be …

被引用次数：55 相关文章所有 10 个版本

Universal readout for graph convolutional neural networks

N Navarin, D Van Tran… - 2019 international joint …, 2019 - ieeexplore.ieee.org

Several machine learning problems can be naturally defined over graph data. Recently,
many researchers have been focusing on the definition of neural networks for graphs. The …

被引用次数：48 相关文章所有 2 个版本

[PDF] arxiv.org

Grokking: Generalization beyond overfitting on small algorithmic datasets

A Power, Y Burda, H Edwards, I Babuschkin… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper we propose to study generalization of neural networks on small algorithmically
generated datasets. In this setting, questions about data efficiency, memorization …

被引用次数：237 相关文章所有 4 个版本

高级搜索

QQ 群

Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization

Double trouble in double descent: Bias and variance (s) in the lazy regime

On the proof of global convergence of gradient descent for deep relu networks with linear widths

What causes the test error? going beyond bias-variance via anova

Kernel and rich regimes in overparametrized models

Rethinking bias-variance trade-off for generalization of neural networks

Training of deep neural networks based on distance measures using RMSProp

Convex geometry and duality of over-parameterized neural networks

Universal readout for graph convolutional neural networks

Grokking: Generalization beyond overfitting on small algorithmic datasets

引用