相关文章- 学术资源搜索

Deep linear networks for regression are implicitly regularized towards flat minima

P Marion, L Chizat - arXiv preprint arXiv:2405.13456, 2024 - arxiv.org

The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to
understand their optimization dynamics. In this paper, we study the sharpness of deep linear …

A convergence analysis of gradient descent for deep linear neural networks

S Arora, N Cohen, N Golowich, W Hu - arXiv preprint arXiv:1810.02281, 2018 - arxiv.org

We analyze speed of convergence to global optimum for gradient descent training a deep
linear neural network (parameterized as $ x\mapsto W_N W_ {N-1}\cdots W_1 x $) by …

被引用次数：259 相关文章所有 7 个版本

[PDF] neurips.cc

Global convergence of gradient descent for deep linear residual networks

L Wu, Q Wang, C Ma - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We analyze the global convergence of gradient descent for deep linear residual networks by
proposing a new initialization: zero-asymmetric (ZAS) initialization. It is motivated by …

被引用次数：28 相关文章所有 7 个版本

[PDF] neurips.cc

Implicit regularization of discrete gradient dynamics in linear neural networks

G Gidel, F Bach… - Advances in Neural …, 2019 - proceedings.neurips.cc

When optimizing over-parameterized models, such as deep neural networks, a large set of
parameters can achieve zero training error. In such cases, the choice of the optimization …

被引用次数：159 相关文章所有 8 个版本

[PDF] neurips.cc

Implicit bias in deep linear classification: Initialization scale vs training accuracy

E Moroshko, BE Woodworth… - Advances in neural …, 2020 - proceedings.neurips.cc

We provide a detailed asymptotic study of gradient flow trajectories and their implicit
optimization bias when minimizing the exponential loss over" diagonal linear networks". This …

被引用次数：85 相关文章所有 8 个版本

[PDF] arxiv.org

Residual networks: Lyapunov stability and convex decomposition

K Nar, S Sastry - arXiv preprint arXiv:1803.08203, 2018 - arxiv.org

While training error of most deep neural networks degrades as the depth of the network
increases, residual networks appear to be an exception. We show that the main reason for …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Gradient descent maximizes the margin of homogeneous neural networks

K Lyu, J Li - arXiv preprint arXiv:1906.05890, 2019 - arxiv.org

In this paper, we study the implicit regularization of the gradient descent algorithm in
homogeneous neural networks, including fully-connected and convolutional neural …

被引用次数：304 相关文章所有 3 个版本

[PDF] mlr.press

On the explicit role of initialization on the convergence and implicit bias of overparametrized linear networks

H Min, S Tarmoun, R Vidal… - … Conference on Machine …, 2021 - proceedings.mlr.press

Neural networks trained via gradient descent with random initialization and without any
regularization enjoy good generalization performance in practice despite being highly …

被引用次数：41 相关文章所有 11 个版本

[PDF] pnas.org Full View

Theoretical issues in deep networks

T Poggio, A Banburski, Q Liao - Proceedings of the …, 2020 - National Acad Sciences

While deep learning is successful in a number of applications, it is not yet well understood
theoretically. A theoretical characterization of deep learning should answer questions about …

被引用次数：197 相关文章所有 11 个版本

[PDF] arxiv.org

The law of parsimony in gradient descent for learning deep linear networks

C Yaras, P Wang, W Hu, Z Zhu, L Balzano… - arXiv preprint arXiv …, 2023 - arxiv.org

Over the past few years, an extensively studied phenomenon in training deep networks is
the implicit bias of gradient descent towards parsimonious solutions. In this work, we …

被引用次数：7 相关文章所有 2 个版本

高级搜索

QQ 群

Deep linear networks for regression are implicitly regularized towards flat minima

A convergence analysis of gradient descent for deep linear neural networks

Global convergence of gradient descent for deep linear residual networks

Implicit regularization of discrete gradient dynamics in linear neural networks

Implicit bias in deep linear classification: Initialization scale vs training accuracy

Residual networks: Lyapunov stability and convex decomposition

Gradient descent maximizes the margin of homogeneous neural networks

On the explicit role of initialization on the convergence and implicit bias of overparametrized linear networks

Theoretical issues in deep networks

The law of parsimony in gradient descent for learning deep linear networks

相关搜索

引用