Proxy convexity: A unified framework for the analysis of neural networks trained by gradient descent

B Liu, M Ye, S Wright, P Stone… - Advances in neural …, 2022 - proceedings.neurips.cc

Bilevel optimization (BO) is useful for solving a variety of important machine learning
problems including but not limited to hyperparameter optimization, meta-learning, continual …

被引用次数：78 相关文章所有 10 个版本

[PDF] mlr.press

Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data

S Frei, NS Chatterji, P Bartlett - Conference on Learning …, 2022 - proceedings.mlr.press

Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …

被引用次数：96 相关文章所有 4 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] mlr.press

Benign overfitting in two-layer ReLU convolutional neural networks

Y Kou, Z Chen, Y Chen, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …

被引用次数：40 相关文章所有 7 个版本

[PDF] arxiv.org

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org

The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

被引用次数：53 相关文章所有 5 个版本

[PDF] arxiv.org

On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation

J Kwon, D Kwon, S Wright, R Nowak - arXiv preprint arXiv:2309.01753, 2023 - arxiv.org

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the
objective functions are smooth but possibly nonconvex in both levels and the variables are …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

On momentum-based gradient methods for bilevel optimization with nonconvex lower-level

F Huang - arXiv preprint arXiv:2303.03944, 2023 - arxiv.org

Bilevel optimization is a popular two-level hierarchical optimization, which has been widely
applied to many machine learning tasks such as hyperparameter learning, meta learning …

被引用次数：17 相关文章所有 2 个版本

[PDF] mlr.press

Self-training converts weak learners to strong learners in mixture models

S Frei, D Zou, Z Chen, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We consider a binary classification problem when the data comes from a mixture of two
rotationally symmetric distributions satisfying concentration and anti-concentration …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

On feature learning in neural networks with global convergence guarantees

Z Chen, E Vanden-Eijnden, J Bruna - arXiv preprint arXiv:2204.10782, 2022 - arxiv.org

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that
allow feature learning while admitting non-asymptotic global convergence guarantees. First …

被引用次数：18 相关文章所有 7 个版本

[PDF] arxiv.org

Enhanced adaptive gradient algorithms for nonconvex-PL minimax optimization

F Huang - arXiv preprint arXiv:2303.03984, 2023 - arxiv.org

In the paper, we study a class of nonconvex nonconcave minimax optimization problems (ie,
$\min_x\max_y f (x, y) $), where $ f (x, y) $ is possible nonconvex in $ x $, and it is …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群