Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

High probability convergence of stochastic gradient methods

Z Liu, TD Nguyen, TH Nguyen… - … on Machine Learning, 2023 - proceedings.mlr.press
In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …

On the convergence of adaptive gradient methods for nonconvex optimization

D Zhou, J Chen, Y Cao, Y Tang, Z Yang… - arXiv preprint arXiv …, 2018 - arxiv.org
Adaptive gradient methods are workhorses in deep learning. However, the convergence
guarantees of adaptive gradient methods for nonconvex optimization have not been …

A single-loop smoothed gradient descent-ascent algorithm for nonconvex-concave min-max problems

J Zhang, P Xiao, R Sun, Z Luo - Advances in neural …, 2020 - proceedings.neurips.cc
Nonconvex-concave min-max problem arises in many machine learning applications
including minimizing a pointwise maximum of a set of nonconvex functions and robust …

The power of adaptivity in sgd: Self-tuning step sizes with unbounded gradients and affine variance

M Faw, I Tziotis, C Caramanis… - … on Learning Theory, 2022 - proceedings.mlr.press
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic
gradient methods (SGD), where the step sizes change based on observed stochastic …

Adaptive gradient methods at the edge of stability

JM Cohen, B Ghorbani, S Krishnan, N Agarwal… - arXiv preprint arXiv …, 2022 - arxiv.org
Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …

[图书][B] Learning theory from first principles

F Bach - 2024 - di.ens.fr
This draft textbook is extracted from lecture notes from a class that I have taught
(unfortunately online, but this gave me an opportunity to write more detailed notes) during …

Zero-shot recommender systems

H Ding, Y Ma, A Deoras, Y Wang, H Wang - arXiv preprint arXiv …, 2021 - arxiv.org
Performance of recommender systems (RS) relies heavily on the amount of training data
available. This poses a chicken-and-egg problem for early-stage products, whose amount of …

Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions

B Wang, H Zhang, Z Ma… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We provide a simple convergence proof for AdaGrad optimizing non-convex objectives
under only affine noise variance and bounded smoothness assumptions. The proof is …

LSTM recurrent neural network for hand gesture recognition using EMG signals

A Toro-Ossaba, J Jaramillo-Tigreros, JC Tejada… - Applied Sciences, 2022 - mdpi.com
Currently, research on gesture recognition systems has been on the rise due to the
capabilities these systems provide to the field of human–machine interaction, however …