Y Liu,
R Pan,
T Zhang - arXiv preprint arXiv:2406.15244, 2024 - arxiv.org
Adaptive gradient methods have been widely adopted in training large-scale deep neural
networks, especially large foundation models. Despite the huge success in practice, their …