Benign overfitting in linear classifiers and leaky relu networks from kkt conditions for...

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

S Frei, G Vardi, P Bartlett… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we study the implications of the implicit bias of gradient flow on generalization
and adversarial robustness in ReLU networks. We focus on a setting where the data …

被引用次数：22 相关文章所有 7 个版本

[PDF] neurips.cc

From tempered to benign overfitting in relu neural networks

G Kornowski, G Yehudai… - Advances in Neural …, 2024 - proceedings.neurips.cc

Overparameterized neural networks (NNs) are observed to generalize well even when
trained to perfectly fit noisy data. This phenomenon motivated a large body of work on" …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org

Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

被引用次数：27 相关文章所有 4 个版本

[PDF] openreview.net

Fourier circuits in neural networks: Unlocking the potential of large language models in mathematical reasoning and modular arithmetic

J Gu, C Li, Y Liang, Z Shi, Z Song… - arXiv preprint arXiv …, 2024 - openreview.net

In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the
internal representations harnessed by neural networks and Transformers. Building on recent …

被引用次数：17 相关文章所有 4 个版本

[PDF] neurips.cc

Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs

D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc

We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Feature emergence via margin maximization: case studies in algebraic tasks

D Morwani, BL Edelman, CA Oncescu, R Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

Understanding the internal representations learned by neural networks is a cornerstone
challenge in the science of machine learning. While there have been significant recent …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Vanishing gradients in reinforcement finetuning of language models

N Razin, H Zhou, O Saremi, V Thilak, A Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained language models are commonly aligned with human preferences and
downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly …

被引用次数：5 相关文章所有 5 个版本

[PDF] neurips.cc

Precise asymptotic generalization for multiclass classification with overparameterized linear models

D Wu, A Sahai - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

We study the asymptotic generalization of an overparameterized linear model for multiclass
classification under the Gaussian covariates bi-level model introduced in Subramanian et …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Noisy interpolation learning with shallow univariate relu networks

N Joshi, G Vardi, N Srebro - arXiv preprint arXiv:2307.15396, 2023 - arxiv.org

We study the asymptotic overfitting behavior of interpolation with minimum norm ($\ell_2 $ of
the weights) two-layer ReLU networks for noisy univariate regression. We show that …

被引用次数：10 相关文章所有 3 个版本

高级搜索

QQ 群