Understanding Adam optimizer via online learning of updates: Adam is FTRL in disguise

K Ahn, Z Zhang, Y Kook, Y Dai - arXiv preprint arXiv:2402.01567, 2024 - arxiv.org
Despite the success of the Adam optimizer in practice, the theoretical understanding of its
algorithmic components still remains limited. In particular, most existing analyses of Adam …

General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization

K Ahn, G Magakyan, A Cutkosky - arXiv preprint arXiv:2411.07061, 2024 - arxiv.org
This work investigates the effectiveness of schedule-free methods, developed by A. Defazio
et al.(NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable …

Adam with model exponential moving average is effective for nonconvex optimization

K Ahn, A Cutkosky - arXiv preprint arXiv:2405.18199, 2024 - arxiv.org
In this work, we offer a theoretical analysis of two modern optimization techniques for
training large and complex models:(i) adaptive optimization algorithms, such as Adam, and …