Escaping saddle points with adaptive gradient methods- 学术资源搜索

Escaping saddle points with adaptive gradient methods

M Staib, S Reddi, S Kale, S Kumar… - … on Machine Learning, 2019 - proceedings.mlr.press

M Staib, S Reddi, S Kale, S Kumar, S Sra

International Conference on Machine Learning, 2019•proceedings.mlr.press

Abstract

Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood. In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings. To this end, we first provide a novel view of adaptive methods as preconditioned SGD, where the preconditioner is estimated in an online manner. By studying the preconditioner on its own, we elucidate its purpose: it rescales the stochastic gradient noise to be isotropic near stationary points, which helps escape saddle points. Furthermore, we show that adaptive methods can efficiently estimate the aforementioned preconditioner. By gluing together these two components, we provide the first (to our knowledge) second-order convergence result for any adaptive method. The key insight from our analysis is that, compared to SGD, adaptive methods escape saddle points faster, and can converge faster overall to second-order stationary points.

proceedings.mlr.press

展开收起

被引用次数：100 相关文章所有 9 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Escaping saddle points with adaptive gradient methods

引用