Reconciling modern deep learning with traditional optimization analyses: The intrinsic learning rate

Q Fournier, GM Caron, D Aloise - ACM Computing Surveys, 2023 - dl.acm.org

Recurrent neural networks are effective models to process sequences. However, they are
unable to learn long-term dependencies because of their inherent sequential nature. As a …

被引用次数：79 相关文章所有 4 个版本

[PDF] thecvf.com

Analyzing and improving the training dynamics of diffusion models

T Karras, M Aittala, J Lehtinen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models currently dominate the field of data-driven image synthesis with their
unparalleled scaling to large datasets. In this paper we identify and rectify several causes for …

被引用次数：35 相关文章所有 3 个版本

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

被引用次数：101 相关文章所有 7 个版本

[PDF] neurips.cc

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

被引用次数：69 相关文章所有 8 个版本

[PDF] mlr.press

Sgd with large step sizes learns sparse features

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

被引用次数：51 相关文章所有 7 个版本

[HTML] nature.com

[HTML][HTML] Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer model

MA Hannan, DNT How, MSH Lipu, M Mansor, PJ Ker… - Scientific reports, 2021 - nature.com

Accurate state of charge (SOC) estimation of lithium-ion (Li-ion) batteries is crucial in
prolonging cell lifespan and ensuring its safe operation for electric vehicle applications. In …

被引用次数：103 相关文章所有 13 个版本

[PDF] arxiv.org

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework

Z Li, T Wang, S Arora - arXiv preprint arXiv:2110.06914, 2021 - arxiv.org

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key
challenges in deep learning, especially for overparametrized models, where the local …

被引用次数：94 相关文章所有 7 个版本

[PDF] arxiv.org

A modern look at the relationship between sharpness and generalization

M Andriushchenko, F Croce, M Müller, M Hein… - arXiv preprint arXiv …, 2023 - arxiv.org

Sharpness of minima is a promising quantity that can correlate with generalization in deep
networks and, when optimized during training, can improve generalization. However …

被引用次数：43 相关文章所有 7 个版本

[PDF] neurips.cc

On the validity of modeling sgd with stochastic differential equations (sdes)

Z Li, S Malladi, S Arora - Advances in Neural Information …, 2021 - proceedings.neurips.cc

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is
important for good generalization in real-life deep nets. Most attempted explanations …

被引用次数：72 相关文章所有 7 个版本

[PDF] mlr.press

Adapting the linearised laplace model evidence for modern deep learning

J Antorán, D Janz, JU Allingham… - International …, 2022 - proceedings.mlr.press

The linearised Laplace method for estimating model uncertainty has received renewed
attention in the Bayesian deep learning community. The method provides reliable error bars …

被引用次数：26 相关文章所有 6 个版本

高级搜索

QQ 群