Improved analysis of clipping algorithms for non-convex optimization

H Wang, L Ding, S Dong, S Shi, A Li… - Advances in Neural …, 2022 - proceedings.neurips.cc

We present a novel two-stage fully sparse convolutional 3D object detection framework,
named CAGroup3D. Our proposed method first generates some high-quality 3D proposals …

被引用次数：82 相关文章所有 8 个版本

[PDF] neurips.cc

Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

被引用次数：54 相关文章所有 5 个版本

[PDF] neurips.cc

Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc

Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

被引用次数：63 相关文章所有 9 个版本

[PDF] thecvf.com

Rbgnet: Ray-based grouping for 3d object detection

H Wang, S Shi, Z Yang, R Fang… - Proceedings of the …, 2022 - openaccess.thecvf.com

As a fundamental problem in computer vision, 3D object detection is experiencing rapid
growth. To extract the point-wise features from the irregularly and sparsely distributed points …

被引用次数：67 相关文章所有 6 个版本

[PDF] mlr.press

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

A Koloskova, H Hendrikx… - … Conference on Machine …, 2023 - proceedings.mlr.press

Gradient clipping is a popular modification to standard (stochastic) gradient descent, at
every iteration limiting the gradient norm to a certain value $ c> 0$. It is widely used for …

被引用次数：37 相关文章所有 29 个版本

[PDF] arxiv.org

Stochastic training is not necessary for generalization

J Geiping, M Goldblum, PE Pope, M Moeller… - arXiv preprint arXiv …, 2021 - arxiv.org

It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …

被引用次数：77 相关文章所有 3 个版本

[PDF] mlr.press

Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions

B Wang, H Zhang, Z Ma… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives
under only affine noise variance and bounded smoothness assumptions. The proof is …

被引用次数：42 相关文章所有 3 个版本

[PDF] arxiv.org

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

F Kunstner, J Chen, JW Lavington… - arXiv preprint arXiv …, 2023 - arxiv.org

The success of the Adam optimizer on a wide array of architectures has made it the default
in settings where stochastic gradient descent (SGD) performs poorly. However, our …

被引用次数：51 相关文章所有 3 个版本

[PDF] acm.org

Provable adaptivity of adam under non-uniform smoothness

B Wang, Y Zhang, H Zhang, Q Meng, R Sun… - Proceedings of the 30th …, 2024 - dl.acm.org

Adam is widely adopted in practical applications due to its fast convergence. However, its
theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely …

被引用次数：39 相关文章所有 3 个版本

[PDF] neurips.cc

High-probability bounds for non-convex stochastic optimization with heavy tails

A Cutkosky, H Mehta - Advances in Neural Information …, 2021 - proceedings.neurips.cc

We consider non-convex stochastic optimization using first-order algorithms for which the
gradient estimates may have heavy tails. We show that a combination of gradient clipping …

被引用次数：59 相关文章所有 6 个版本

高级搜索

QQ 群