This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al.(NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable …
K Ahn, A Cutkosky - arXiv preprint arXiv:2405.18199, 2024 - arxiv.org
In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models:(i) adaptive optimization algorithms, such as Adam, and …