Loss Landscape Characterization of Neural Networks without Over-Parametrization

R Islamov, N Ajroldi, A Orvieto, A Lucchi - arXiv preprint arXiv:2410.12455, 2024 - arxiv.org
Optimization methods play a crucial role in modern machine learning, powering the
remarkable empirical achievements of deep learning models. These successes are even …

Understanding Adam Requires Better Rotation Dependent Assumptions

L Maes, TH Zhang, A Jolicoeur-Martineau… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent
(SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's …

Understanding Adam Requires Better Rotation Dependent Assumptions

TH Zhang, L Maes, A Jolicoeur-Martineau… - OPT 2024: Optimization … - openreview.net
Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent
(SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's …

[PDF][PDF] ACCELERATING TRAINING WITH NEURON INTERACTION AND NOWCASTING NETWORKS

BKA Moudgil, G Lajoie, E Belilovsky, S Lacoste-Julien - bknyaz.github.io
Neural network training can be accelerated when a learnable update rule is used in lieu of
classic adaptive optimizers (eg Adam). However, learnable update rules can be costly and …