Algorithm-dependent generalization bounds for overparameterized deep residual networks

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：250 相关文章所有 7 个版本

[PDF] mlr.press

A theoretical analysis of deep Q-learning

J Fan, Z Wang, Y Xie, Z Yang - Learning for dynamics and …, 2020 - proceedings.mlr.press

Despite the great empirical success of deep reinforcement learning, its theoretical
foundation is less well understood. In this work, we make the first attempt to theoretically …

被引用次数：853 相关文章所有 9 个版本

[PDF] mlr.press

Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data

S Frei, NS Chatterji, P Bartlett - Conference on Learning …, 2022 - proceedings.mlr.press

Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …

被引用次数：96 相关文章所有 4 个版本

[PDF] arxiv.org

How much over-parameterization is sufficient to learn deep ReLU networks?

Z Chen, Y Cao, D Zou, Q Gu - arXiv preprint arXiv:1911.12360, 2019 - arxiv.org

A recent line of research on deep learning focuses on the extremely over-parameterized
setting, and shows that when the network width is larger than a high degree polynomial of …

被引用次数：147 相关文章所有 5 个版本

[PDF] neurips.cc

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks?---A Neural Tangent Kernel Perspective

K Huang, Y Wang, M Tao… - Advances in neural …, 2020 - proceedings.neurips.cc

Deep residual networks (ResNets) have demonstrated better generalization performance
than deep feedforward networks (FFNets). However, the theory behind such a phenomenon …

被引用次数：112 相关文章所有 8 个版本

[PDF] arxiv.org

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org

The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

被引用次数：53 相关文章所有 5 个版本

[PDF] arxiv.org

Implicit regularization of deep residual networks towards neural ODEs

P Marion, YH Wu, ME Sander, G Biau - arXiv preprint arXiv:2309.01213, 2023 - arxiv.org

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …

被引用次数：16 相关文章所有 10 个版本

[PDF] neurips.cc

Proxy convexity: A unified framework for the analysis of neural networks trained by gradient descent

S Frei, Q Gu - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Although the optimization objectives for learning neural networks are highly non-convex,
gradient-based methods have been wildly successful at learning neural networks in …

被引用次数：29 相关文章所有 7 个版本

[PDF] jmlr.org

Overparameterization of deep ResNet: zero loss and mean-field analysis

Z Ding, S Chen, Q Li, SJ Wright - Journal of machine learning research, 2022 - jmlr.org

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex
optimization problem, but a basic first-order optimization method (gradient descent) finds a …

被引用次数：37 相关文章所有 9 个版本

[PDF] neurips.cc

On the generalization of learning algorithms that do not converge

N Chandramoorthy, A Loukas… - Advances in Neural …, 2022 - proceedings.neurips.cc

Generalization analyses of deep learning typically assume that the training converges to a
fixed point. But, recent results indicate that in practice, the weights of deep neural networks …

被引用次数：13 相关文章所有 7 个版本

高级搜索

QQ 群