R Pan, H Ye, T Zhang - arXiv preprint arXiv:2110.14109, 2021 - arxiv.org
Learning rate schedulers have been widely adopted in training deep neural networks. Despite their practical importance, there is a discrepancy between its practice and its …
Y Zhao, L Jiang, M Gao, N Jing, C Gu, Q Tang… - arXiv preprint arXiv …, 2022 - arxiv.org
The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second …
X Shen, A Ali, S Boyd - Optimization and Engineering, 2023 - Springer
We consider the problem of minimizing a composite convex function with two different access methods: an oracle, for which we can evaluate the value and gradient, and a …