Cagroup3d: Class-aware grouping for 3d object detection on point clouds

H Wang, L Ding, S Dong, S Shi, A Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present a novel two-stage fully sparse convolutional 3D object detection framework,
named CAGroup3D. Our proposed method first generates some high-quality 3D proposals …

Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

Rbgnet: Ray-based grouping for 3d object detection

H Wang, S Shi, Z Yang, R Fang… - Proceedings of the …, 2022 - openaccess.thecvf.com
As a fundamental problem in computer vision, 3D object detection is experiencing rapid
growth. To extract the point-wise features from the irregularly and sparsely distributed points …

Revisiting Gradient Clipping: Stochastic bias and tight convergence guarantees

A Koloskova, H Hendrikx… - … Conference on Machine …, 2023 - proceedings.mlr.press
Gradient clipping is a popular modification to standard (stochastic) gradient descent, at
every iteration limiting the gradient norm to a certain value $ c> 0$. It is widely used for …

Stochastic training is not necessary for generalization

J Geiping, M Goldblum, PE Pope, M Moeller… - arXiv preprint arXiv …, 2021 - arxiv.org
It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …

Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions

B Wang, H Zhang, Z Ma… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We provide a simple convergence proof for AdaGrad optimizing non-convex objectives
under only affine noise variance and bounded smoothness assumptions. The proof is …

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

F Kunstner, J Chen, JW Lavington… - arXiv preprint arXiv …, 2023 - arxiv.org
The success of the Adam optimizer on a wide array of architectures has made it the default
in settings where stochastic gradient descent (SGD) performs poorly. However, our …

Provable adaptivity of adam under non-uniform smoothness

B Wang, Y Zhang, H Zhang, Q Meng, R Sun… - Proceedings of the 30th …, 2024 - dl.acm.org
Adam is widely adopted in practical applications due to its fast convergence. However, its
theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely …

High-probability bounds for non-convex stochastic optimization with heavy tails

A Cutkosky, H Mehta - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We consider non-convex stochastic optimization using first-order algorithms for which the
gradient estimates may have heavy tails. We show that a combination of gradient clipping …