- 学术资源搜索

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks

Z Ji, M Telgarsky - arXiv preprint arXiv:1909.12292, 2019 - arxiv.org

Recent theoretical work has guaranteed that overparameterized networks trained by
gradient descent achieve arbitrarily low training error, and sometimes even low test error …

被引用次数：212 相关文章所有 5 个版本

[PDF] arxiv.org

How much over-parameterization is sufficient to learn deep ReLU networks?

Z Chen, Y Cao, D Zou, Q Gu - arXiv preprint arXiv:1911.12360, 2019 - arxiv.org

A recent line of research on deep learning focuses on the extremely over-parameterized
setting, and shows that when the network width is larger than a high degree polynomial of …

被引用次数：147 相关文章所有 5 个版本

[PDF] arxiv.org

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

A Montanari, Y Zhong - The Annals of Statistics, 2022 - projecteuclid.org

The interpolation phase transition in neural networks: Memorization and generalization
under lazy training Page 1 The Annals of Statistics 2022, Vol. 50, No. 5, 2816–2847 https://doi.org/10.1214/22-AOS2211 …

被引用次数：110 相关文章所有 4 个版本

[PDF] arxiv.org

On the optimization and generalization of multi-head attention

P Deora, R Ghaderi, H Taheri… - arXiv preprint arXiv …, 2023 - arxiv.org

The training and generalization dynamics of the Transformer's core mechanism, namely the
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …

被引用次数：32 相关文章所有 3 个版本

[PDF] mlr.press

Bounding the width of neural networks via coupled initialization a worst case analysis

A Munteanu, S Omlor, Z Song… - … on Machine Learning, 2022 - proceedings.mlr.press

A common method in training neural networks is to initialize all the weights to be
independent Gaussian vectors. We observe that by instead initializing the weights into …

被引用次数：24 相关文章所有 4 个版本

[PDF] mlr.press

On the proof of global convergence of gradient descent for deep relu networks with linear widths

Q Nguyen - International Conference on Machine Learning, 2021 - proceedings.mlr.press

We give a simple proof for the global convergence of gradient descent in training deep
ReLU networks with the standard square loss, and show some of its improvements over the …

被引用次数：60 相关文章所有 6 个版本

[PDF] mlr.press

Robust learning for data poisoning attacks

Y Wang, P Mianjy, R Arora - International Conference on …, 2021 - proceedings.mlr.press

We investigate the robustness of stochastic approximation approaches against data
poisoning attacks. We focus on two-layer neural networks with ReLU activation and show …

被引用次数：42 相关文章所有 6 个版本

[PDF] neurips.cc

Global convergence of deep networks with one wide layer followed by pyramidal topology

QN Nguyen, M Mondelli - Advances in Neural Information …, 2020 - proceedings.neurips.cc

Recent works have shown that gradient descent can find a global minimum for over-
parameterized neural networks where the widths of all the hidden layers scale polynomially …

被引用次数：81 相关文章所有 10 个版本

[PDF] arxiv.org

Generalization guarantees for neural networks via harnessing the low-rank structure of the jacobian

S Oymak, Z Fabian, M Li, M Soltanolkotabi - arXiv preprint arXiv …, 2019 - arxiv.org

Modern neural network architectures often generalize well despite containing many more
parameters than the size of the training dataset. This paper explores the generalization …

被引用次数：92 相关文章所有 5 个版本

[PDF] arxiv.org

Six lectures on linearized neural networks

T Misiakiewicz, A Montanari - arXiv preprint arXiv:2308.13431, 2023 - arxiv.org

In these six lectures, we examine what can be learnt about the behavior of multi-layer neural
networks from the analysis of linear models. We first recall the correspondence between …

被引用次数：12 相关文章所有 4 个版本

高级搜索

QQ 群

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks

How much over-parameterization is sufficient to learn deep ReLU networks?

The interpolation phase transition in neural networks: Memorization and generalization under lazy training

On the optimization and generalization of multi-head attention

Bounding the width of neural networks via coupled initialization a worst case analysis

On the proof of global convergence of gradient descent for deep relu networks with linear widths

Robust learning for data poisoning attacks

Global convergence of deep networks with one wide layer followed by pyramidal topology

Generalization guarantees for neural networks via harnessing the low-rank structure of the jacobian

Six lectures on linearized neural networks

引用