Vanishing Curvature in Randomly Initialized Deep ReLU Networks.

Evaluation of classification models in limited data scenarios with application to additive manufacturing

F Pourkamali-Anaraki, T Nasrin, RE Jensen… - … Applications of Artificial …, 2023 - Elsevier

This paper presents a novel framework that enables the generation of unbiased estimates
for test loss using fewer labeled samples, effectively evaluating the predictive performance …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Heavy-tailed class imbalance and why adam outperforms gradient descent on language models

F Kunstner, R Yadav, A Milligan, M Schmidt… - arXiv preprint arXiv …, 2024 - arxiv.org

Adam has been shown to outperform gradient descent in optimizing large language
transformers empirically, and by a larger margin than on other tasks, but it is unclear why this …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

An adaptive stochastic gradient method with non-negative gauss-newton stepsizes

A Orvieto, L Xiao - arXiv preprint arXiv:2407.04358, 2024 - arxiv.org

We consider the problem of minimizing the average of a large number of smooth but
possibly non-convex functions. In the context of most machine learning applications, each …

被引用次数：1 相关文章

[PDF] arxiv.org

Deconstructing the Goldilocks Zone of Neural Network Initialization

A Vysogorets, A Dawid, J Kempe - arXiv preprint arXiv:2402.03579, 2024 - arxiv.org

The second-order properties of the training loss have a massive impact on the optimization
dynamics of deep learning models. Fort & Scherlis (2019) discovered that a high positive …

FOSI: Hybrid First and Second Order Optimization

H Sivan, M Gabel, A Schuster - arXiv preprint arXiv:2302.08484, 2023 - arxiv.org

Though second-order optimization methods are highly effective, popular approaches in
machine learning such as SGD and Adam use only first-order information due to the difficulty …

Why do machine learning optimizers that work, work?

F Kunstner - 2024 - open.library.ubc.ca

The impressive recent applications of machine learning have coincided with an increase in
the costs of developing new methods. Beyond the obvious computational cost due to the …

Structured Generative Models for Controllable Scene and 3D Content Synthesis

D Pavllo - 2023 - research-collection.ethz.ch

Deep learning has fundamentally transformed the field of image synthesis, facilitated by the
emergence of generative models that demonstrate remarkable ability to generate …

[PDF] ethz.ch

[PDF][PDF] Dynamics of Adaptive Momentum Optimizers on Challenging Deep Learning Landscapes

A Orvieto - 2023 - research-collection.ethz.ch

Deep learning technologies are skyrocketing in popularity across a wide range of domains,
with groundbreaking accomplishments in fields such as natural language processing …

[PDF] openreview.net

Why Adam Outperforms Gradient Descent on Language Models: A Heavy-Tailed Class Imbalance Problem

R Yadav, F Kunstner, M Schmidt, A Bietti - OPT 2023: Optimization for … - openreview.net

We show that the heavy-tailed class imbalance found in language modeling tasks leads to
difficul-ties in optimization dynamics. When training with gradient descent, the loss …

被引用次数：2 相关文章所有 3 个版本

高级搜索

QQ 群