A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of …
A Montanari, Y Zhong - The Annals of Statistics, 2022 - projecteuclid.org
The interpolation phase transition in neural networks: Memorization and generalization under lazy training Page 1 The Annals of Statistics 2022, Vol. 50, No. 5, 2816–2847 https://doi.org/10.1214/22-AOS2211 …
P Deora, R Ghaderi, H Taheri… - arXiv preprint arXiv …, 2023 - arxiv.org
The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …
A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into …
Q Nguyen - International Conference on Machine Learning, 2021 - proceedings.mlr.press
We give a simple proof for the global convergence of gradient descent in training deep ReLU networks with the standard square loss, and show some of its improvements over the …
Y Wang, P Mianjy, R Arora - International Conference on …, 2021 - proceedings.mlr.press
We investigate the robustness of stochastic approximation approaches against data poisoning attacks. We focus on two-layer neural networks with ReLU activation and show …
QN Nguyen, M Mondelli - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Recent works have shown that gradient descent can find a global minimum for over- parameterized neural networks where the widths of all the hidden layers scale polynomially …
Modern neural network architectures often generalize well despite containing many more parameters than the size of the training dataset. This paper explores the generalization …
In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models. We first recall the correspondence between …