Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion

A Cutkosky, H Mehta… - … Conference on Machine …, 2023 - proceedings.mlr.press
We present new algorithms for optimizing non-smooth, non-convex stochastic objectives
based on a novel analysis technique. This improves the current best-known complexity for …

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

A Sadiev, M Danilova, E Gorbunov… - International …, 2023 - proceedings.mlr.press
During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …

Beyond uniform smoothness: A stopped analysis of adaptive sgd

M Faw, L Rout, C Caramanis… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
This work considers the problem of finding a first-order stationary point of a non-convex
function with potentially unbounded smoothness constant using a stochastic gradient oracle …

Methods for Convex -Smooth Optimization: Clipping, Acceleration, and Adaptivity

E Gorbunov, N Tupitsa, S Choudhury, A Aliev… - arXiv preprint arXiv …, 2024 - arxiv.org
Due to the non-smoothness of optimization problems in Machine Learning, generalized
smoothness assumptions have been gaining a lot of attention in recent years. One of the …

Which mode is better for federated learning? Centralized or Decentralized

Y Sun, L Shen, D Tao - arXiv preprint arXiv:2310.03461, 2023 - arxiv.org
Both centralized and decentralized approaches have shown excellent performance and
great application value in federated learning (FL). However, current studies do not provide …

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

A Mishkin, A Khaled, Y Wang, A Defazio… - arXiv preprint arXiv …, 2024 - arxiv.org
We develop new sub-optimality bounds for gradient descent (GD) that depend on the
conditioning of the objective along the path of optimization, rather than on global, worst-case …

Global convergence of the gradient method for functions definable in o-minimal structures

C Josz - Mathematical Programming, 2023 - Springer
We consider the gradient method with variable step size for minimizing functions that are
definable in o-minimal structures on the real field and differentiable with locally Lipschitz …

Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks

V Patel, C Varner - arXiv preprint arXiv:2409.13672, 2024 - arxiv.org
The presence of non-convexity in smooth optimization problems arising from deep learning
have sparked new smoothness conditions in the literature and corresponding convergence …

Gradient Descent-Based Task-Orientation Robot Control Enhanced with Gaussian Process Predictions

L Roveda, M Pavone - IEEE Robotics and Automation Letters, 2024 - ieeexplore.ieee.org
This letter proposes a novel force-based task-orientation controller for interaction tasks with
environmental orientation uncertainties. The main aim of the controller is to align the robot …

Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed

S Chezhegov, Y Klyukin, A Semenov… - arXiv preprint arXiv …, 2024 - arxiv.org
Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training
modern Deep Learning models, especially Large Language Models. Typically, the noise in …