- 学术资源搜索

Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc

Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

被引用次数：46 相关文章所有 9 个版本

[PDF] mlr.press

High probability convergence of stochastic gradient methods

Z Liu, TD Nguyen, TH Nguyen… - … on Machine Learning, 2023 - proceedings.mlr.press

In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …

被引用次数：33 相关文章所有 8 个版本

[PDF] arxiv.org

On the convergence of adaptive gradient methods for nonconvex optimization

D Zhou, J Chen, Y Cao, Y Tang, Z Yang… - arXiv preprint arXiv …, 2018 - arxiv.org

Adaptive gradient methods are workhorses in deep learning. However, the convergence
guarantees of adaptive gradient methods for nonconvex optimization have not been …

被引用次数：197 相关文章所有 7 个版本

[PDF] neurips.cc

A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems

J Zhang, P Xiao, R Sun, Z Luo - Advances in neural …, 2020 - proceedings.neurips.cc

Nonconvex-concave min-max problem arises in many machine learning applications
including minimizing a pointwise maximum of a set of nonconvex functions and robust …

被引用次数：100 相关文章所有 6 个版本

[PDF] mlr.press

The power of adaptivity in sgd: Self-tuning step sizes with unbounded gradients and affine variance

M Faw, I Tziotis, C Caramanis… - … on Learning Theory, 2022 - proceedings.mlr.press

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic
gradient methods (SGD), where the step sizes change based on observed stochastic …

被引用次数：56 相关文章所有 7 个版本

[PDF] arxiv.org

Adaptive gradient methods at the edge of stability

JM Cohen, B Ghorbani, S Krishnan, N Agarwal… - arXiv preprint arXiv …, 2022 - arxiv.org

Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …

被引用次数：44 相关文章所有 3 个版本

[PDF] ens.fr

[图书][B] Learning theory from first principles

F Bach - 2024 - di.ens.fr

This draft textbook is extracted from lecture notes from a class that I have taught
(unfortunately online, but this gave me an opportunity to write more detailed notes) during …

被引用次数：89 相关文章所有 3 个版本

[PDF] arxiv.org

Zero-shot recommender systems

H Ding, Y Ma, A Deoras, Y Wang, H Wang - arXiv preprint arXiv …, 2021 - arxiv.org

Performance of recommender systems (RS) relies heavily on the amount of training data
available. This poses a chicken-and-egg problem for early-stage products, whose amount of …

被引用次数：75 相关文章所有 7 个版本

[PDF] mlr.press

Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions

B Wang, H Zhang, Z Ma… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives
under only affine noise variance and bounded smoothness assumptions. The proof is …

被引用次数：31 相关文章所有 3 个版本

[PDF] mdpi.com

LSTM recurrent neural network for hand gesture recognition using EMG signals

A Toro-Ossaba, J Jaramillo-Tigreros, JC Tejada… - Applied Sciences, 2022 - mdpi.com

Currently, research on gesture recognition systems has been on the rise due to the
capabilities these systems provide to the field of human–machine interaction, however …

被引用次数：56 相关文章所有 10 个版本

高级搜索

QQ 群

Robustness to unbounded smoothness of generalized signsgd

High probability convergence of stochastic gradient methods

On the convergence of adaptive gradient methods for nonconvex optimization

A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems

The power of adaptivity in sgd: Self-tuning step sizes with unbounded gradients and affine variance

Adaptive gradient methods at the edge of stability

[图书][B] Learning theory from first principles

Zero-shot recommender systems

Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions

LSTM recurrent neural network for hand gesture recognition using EMG signals

引用