Implicit bias in leaky relu networks trained on high-dimensional data

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

被引用次数：97 相关文章所有 5 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

Transformers learn through gradual rank increase

E Abbe, S Bengio, E Boix-Adsera… - Advances in …, 2024 - proceedings.neurips.cc

We identify incremental learning dynamics in transformers, where the difference between
trained and initial weights progressively increases in rank. We rigorously prove this occurs …

被引用次数：30 相关文章所有 6 个版本

[PDF] neurips.cc

Implicit bias of gradient descent for two-layer reLU and leaky reLU networks on nearly-orthogonal data

Y Kou, Z Chen, Q Gu - Advances in Neural Information …, 2024 - proceedings.neurips.cc

The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …

被引用次数：13 相关文章所有 7 个版本

[PDF] mlr.press

Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for margin maximization

S Frei, G Vardi, P Bartlett… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Linear classifiers and leaky ReLU networks trained by gradient flow on the logistic loss have
an implicit bias towards solutions which satisfy the Karush–Kuhn–Tucker (KKT) conditions …

被引用次数：38 相关文章所有 5 个版本

[PDF] neurips.cc

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

被引用次数：22 相关文章所有 7 个版本

[PDF] arxiv.org

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org

Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

被引用次数：27 相关文章所有 4 个版本

[PDF] openreview.net

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net

In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

被引用次数：17 相关文章所有 4 个版本

[PDF] jmlr.org

Random feature amplification: Feature learning and generalization in neural networks

S Frei, NS Chatterji, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org

In this work, we provide a characterization of the feature-learning process in two-layer ReLU
networks trained by gradient descent on the logistic loss following random initialization. We …

被引用次数：33 相关文章所有 4 个版本

[PDF] thecvf.com

Neural Redshift: Random Networks are not Random Functions

D Teney, AM Nicolicioiu, V Hartmann… - Proceedings of the …, 2024 - openaccess.thecvf.com

Our understanding of the generalization capabilities of neural networks NNs is still
incomplete. Prevailing explanations are based on implicit biases of gradient descent GD but …

被引用次数：9 相关文章所有 3 个版本

高级搜索

QQ 群