Gradient descent in the absence of global lipschitz continuity of the gradients

A Cutkosky, H Mehta… - … Conference on Machine …, 2023 - proceedings.mlr.press

We present new algorithms for optimizing non-smooth, non-convex stochastic objectives
based on a novel analysis technique. This improves the current best-known complexity for …

被引用次数：41 相关文章所有 8 个版本

[PDF] mlr.press

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

A Sadiev, M Danilova, E Gorbunov… - International …, 2023 - proceedings.mlr.press

During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …

被引用次数：47 相关文章所有 14 个版本

[PDF] mlr.press

Beyond uniform smoothness: A stopped analysis of adaptive sgd

M Faw, L Rout, C Caramanis… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

This work considers the problem of finding a first-order stationary point of a non-convex
function with potentially unbounded smoothness constant using a stochastic gradient oracle …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Methods for Convex -Smooth Optimization: Clipping, Acceleration, and Adaptivity

E Gorbunov, N Tupitsa, S Choudhury, A Aliev… - arXiv preprint arXiv …, 2024 - arxiv.org

Due to the non-smoothness of optimization problems in Machine Learning, generalized
smoothness assumptions have been gaining a lot of attention in recent years. One of the …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Which mode is better for federated learning? Centralized or Decentralized

Y Sun, L Shen, D Tao - arXiv preprint arXiv:2310.03461, 2023 - arxiv.org

Both centralized and decentralized approaches have shown excellent performance and
great application value in federated learning (FL). However, current studies do not provide …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

A Mishkin, A Khaled, Y Wang, A Defazio… - arXiv preprint arXiv …, 2024 - arxiv.org

We develop new sub-optimality bounds for gradient descent (GD) that depend on the
conditioning of the objective along the path of optimization, rather than on global, worst-case …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Global convergence of the gradient method for functions definable in o-minimal structures

C Josz - Mathematical Programming, 2023 - Springer

We consider the gradient method with variable step size for minimizing functions that are
definable in o-minimal structures on the real field and differentiable with locally Lipschitz …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群