Stochastic Anderson mixing for nonconvex stochastic optimization

F Wei, C Bao, Y Liu - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its
success and wide usage in scientific computing, the convergence theory of AM remains …

Eigencurve: Optimal learning rate schedule for sgd on quadratic objectives with skewed hessian spectrums

R Pan, H Ye, T Zhang - arXiv preprint arXiv:2110.14109, 2021 - arxiv.org
Learning rate schedulers have been widely adopted in training deep neural networks.
Despite their practical importance, there is a discrepancy between its practice and its …

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

Y Zhao, L Jiang, M Gao, N Jing, C Gu, Q Tang… - arXiv preprint arXiv …, 2022 - arxiv.org
The second-order training methods can converge much faster than first-order optimizers in
DNN training. This is because the second-order training utilizes the inversion of the second …

Minimizing oracle-structured composite functions

X Shen, A Ali, S Boyd - Optimization and Engineering, 2023 - Springer
We consider the problem of minimizing a composite convex function with two different
access methods: an oracle, for which we can evaluate the value and gradient, and a …