Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

A weakly supervised consistency-based learning method for covid-19 segmentation in ct images

I Laradji, P Rodriguez, O Manas… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world
causing an existential health crisis. Thus, having a system that automatically detects COVID …

Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball

O Sebbouh, RM Gower… - Conference on Learning …, 2021 - proceedings.mlr.press
We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB,
otherwise known as the momentum method) for the general stochastic approximation …

Dynamics of sgd with stochastic polyak stepsizes: Truly adaptive variants and convergence to exact solution

A Orvieto, S Lacoste-Julien… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Recently Loizou et al.(2021), proposed and analyzed stochastic gradient descent
(SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong …

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …

Amortized proximal optimization

J Bae, P Vicol, JZ HaoChen… - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose a framework for online meta-optimization of parameters that govern
optimization, called Amortized Proximal Optimization (APO). We first interpret various …

Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent

S Vaswani, B Dubois-Taine… - … on machine learning, 2022 - proceedings.mlr.press
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^ 2$ in
the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth …

Why line search when you can plane search? so-friendly neural networks allow per-iteration optimization of learning and momentum rates for every layer

B Shea, M Schmidt - arXiv preprint arXiv:2406.17954, 2024 - arxiv.org
We introduce the class of SO-friendly neural networks, which include several models used in
practice including networks with 2 layers of hidden weights where the number of inputs is …

SVRG meets adagrad: Painless variance reduction

B Dubois-Taine, S Vaswani, R Babanezhad… - Machine Learning, 2022 - Springer
Variance reduction (VR) methods for finite-sum minimization typically require the knowledge
of problem-dependent constants that are often unknown and difficult to estimate. To address …

Target-based surrogates for stochastic optimization

JW Lavington, S Vaswani, R Babanezhad… - arXiv preprint arXiv …, 2023 - arxiv.org
We consider minimizing functions for which it is expensive to compute the (possibly
stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation …