Provable generalization of sgd-trained neural networks of any width in the presence of adversaria...

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

被引用次数：121 相关文章所有 8 个版本

[PDF] mlr.press

Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data

S Frei, NS Chatterji, P Bartlett - Conference on Learning …, 2022 - proceedings.mlr.press

Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …

被引用次数：96 相关文章所有 4 个版本

[PDF] neurips.cc

Gradient descent on two-layer nets: Margin maximization and simplicity bias

K Lyu, Z Li, R Wang, S Arora - Advances in Neural …, 2021 - proceedings.neurips.cc

The generalization mystery of overparametrized deep nets has motivated efforts to
understand how gradient descent (GD) converges to low-loss solutions that generalize well …

被引用次数：87 相关文章所有 7 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] mlr.press

Benign overfitting in two-layer ReLU convolutional neural networks

Y Kou, Z Chen, Y Chen, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …

被引用次数：40 相关文章所有 7 个版本

[PDF] mlr.press

Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing

J Jin, Z Li, K Lyu, SS Du, JD Lee - … Conference on Machine …, 2023 - proceedings.mlr.press

It is believed that Gradient Descent (GD) induces an implicit bias towards good
generalization in training machine learning models. This paper provides a fine-grained …

被引用次数：40 相关文章所有 9 个版本

[PDF] neurips.cc

Understanding and improving feature learning for out-of-distribution generalization

Y Chen, W Huang, K Zhou, Y Bian… - Advances in Neural …, 2024 - proceedings.neurips.cc

A common explanation for the failure of out-of-distribution (OOD) generalization is that the
model trained with empirical risk minimization (ERM) learns spurious features instead of …

被引用次数：32 相关文章所有 7 个版本

[PDF] mlr.press

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org

The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

被引用次数：53 相关文章所有 5 个版本

[PDF] jmlr.org

Random feature amplification: Feature learning and generalization in neural networks

S Frei, NS Chatterji, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org

In this work, we provide a characterization of the feature-learning process in two-layer ReLU
networks trained by gradient descent on the logistic loss following random initialization. We …

被引用次数：33 相关文章所有 4 个版本

高级搜索

QQ 群