[PDF][PDF] Neural optimizers with hypergradients for tuning parameter-wise learning rates

J Fu, R Ng, D Chen, I Ilievski, C Pal… - JMLR: workshop and …, 2017 - researchgate.net
JMLR: workshop and conference proceedings, 2017researchgate.net
Recent studies show that LSTM-based neural optimizers are competitive with state-of-theart
hand-designed optimization methods for short horizons. Existing neural optimizers learn
how to update the optimizee parameters, namely, predicting the product of learning rates
and gradients directly and we suspect it is the reason why the training task becomes
unnecessarily difficult. Instead, we train a neural optimizer to only control the learning rates
of another optimizer using gradients of the training loss with respect to the learning rates …
Abstract
Recent studies show that LSTM-based neural optimizers are competitive with state-of-theart hand-designed optimization methods for short horizons. Existing neural optimizers learn how to update the optimizee parameters, namely, predicting the product of learning rates and gradients directly and we suspect it is the reason why the training task becomes unnecessarily difficult. Instead, we train a neural optimizer to only control the learning rates of another optimizer using gradients of the training loss with respect to the learning rates. Furthermore, with the assumption that learning rates tend to remain unchanged over a certain number of iterations, the neural optimizer is only allowed to propose learning rates every S iterations where the learning rates are fixed during these S iterations and this enables it to generalize to longer horizons. The optimizee is trained by Adam on MNIST, and our neural optimizer learns to tune the learning rates for the Adam. After 5 meta-iterations, another optimizee trained by Adam whose learning rates are tuned by the learned but frozen neural optimizer, outperforms those trained by existing hand-designed and learned neural optimizers in terms of convergence rate and final accuracy for long horizons across several datasets.
researchgate.net
以上显示的是最相近的搜索结果。 查看全部搜索结果