The double-edged sword of implicit bias: Generalization vs. robustness in relu networks

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：115 相关文章所有 3 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs

D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc

We prove that, for the fundamental regression task of learning a single neuron, training a
one-hidden layer ReLU network of any width by gradient flow from a small initialisation …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

U Anwar, J Von Oswald, L Kirsch, D Krueger… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have demonstrated remarkable in-context learning capabilities across various
domains, including statistical learning tasks. While previous work has shown that …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Trained transformer classifiers generalize and exhibit benign overfitting in-context

S Frei, G Vardi - arXiv preprint arXiv:2410.01774, 2024 - arxiv.org

Transformers have the capacity to act as supervised learning algorithms: by properly
encoding a set of labeled training (" in-context") examples and an unlabeled test example …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

B Li, Y Li - arXiv preprint arXiv:2410.08503, 2024 - arxiv.org

Adversarial training is a widely-applied approach to training deep neural networks to be
robust against adversarial perturbation. However, although adversarial training has …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

B Li, Z Pan, K Lyu, J Li - arXiv preprint arXiv:2410.10322, 2024 - arxiv.org

In this work, we investigate a particular implicit bias in the gradient descent training process,
which we term" Feature Averaging", and argue that it is one of the principal factors …

Optimization dependent generalization bound for ReLU networks based on sensitivity in the tangent bundle

D Rácz, M Petreczky, A Csertán, B Daróczy - arXiv preprint arXiv …, 2023 - arxiv.org

Recent advances in deep learning have given us some very promising results on the
generalization ability of deep neural networks, however literature still lacks a comprehensive …

被引用次数：1 相关文章所有 6 个版本

[PDF] arxiv.org

Can Implicit Bias Imply Adversarial Robustness?

H Min, R Vidal - arXiv preprint arXiv:2405.15942, 2024 - arxiv.org

The implicit bias of gradient-based training algorithms has been considered mostly
beneficial as it leads to trained networks that often generalize well. However, Frei et …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

MALT Powers Up Adversarial Attacks

O Melamed, G Yehudai, A Shamir - arXiv preprint arXiv:2407.02240, 2024 - arxiv.org

Current adversarial attacks for multi-class classifiers choose the target class for a given input
naively, based on the classifier's confidence levels for various target classes. We present a …

高级搜索

QQ 群