Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
D Chistikov, M Englert, R Lazic - Advances in Neural …, 2023 - proceedings.neurips.cc
We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network of any width by gradient flow from a small initialisation …
Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that …
S Frei, G Vardi - arXiv preprint arXiv:2410.01774, 2024 - arxiv.org
Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training (" in-context") examples and an unlabeled test example …
B Li, Y Li - arXiv preprint arXiv:2410.08503, 2024 - arxiv.org
Adversarial training is a widely-applied approach to training deep neural networks to be robust against adversarial perturbation. However, although adversarial training has …
B Li, Z Pan, K Lyu, J Li - arXiv preprint arXiv:2410.10322, 2024 - arxiv.org
In this work, we investigate a particular implicit bias in the gradient descent training process, which we term" Feature Averaging", and argue that it is one of the principal factors …
Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive …
H Min, R Vidal - arXiv preprint arXiv:2405.15942, 2024 - arxiv.org
The implicit bias of gradient-based training algorithms has been considered mostly beneficial as it leads to trained networks that often generalize well. However, Frei et …
O Melamed, G Yehudai, A Shamir - arXiv preprint arXiv:2407.02240, 2024 - arxiv.org
Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a …