查看文章

openreview.net 中的 [PDF]

Demon: Momentum Decay for Improved Neural Network Training

作者

John Chen, Cameron Wolfe, Zhao Li, Anastasios Kyrillidis

发表日期

2020/9/28

研讨会论文

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

简介

Momentum is a popular technique in deep learning for gradient-based optimizers. We propose a decaying momentum (Demon) rule, motivated by decaying the total contribution of a gradient to all future updates. Applying Demon to Adam leads to significantly improved training, notably competitive to momentum SGD with learning rate decay, even in settings in which adaptive methods are typically non-competitive. Similarly, applying Demon to momentum SGD improves over momentum SGD with learning rate decay in most cases. Notably, Demon momentum SGD is observed to be significantly less sensitive to parameter tuning than momentum SGD with learning rate decay schedule, critical to training neural networks in practice. Results are demonstrated across a variety of settings and architectures, including image classification, generative models, and language models. Demon is easy to implement and tune, and incurs limited extra computational overhead, compared to the vanilla counterparts. Code is readily available.

引用总数

被引用次数：38

202020212022202320243 6 5 13 10

学术搜索中的文章

Demon: improved neural network training with momentum decay*

J Chen, C Wolfe, Z Li, A Kyrillidis - ICASSP 2022-2022 IEEE International Conference on …, 2022

被引用次数：22 相关文章所有 4 个版本

Decaying momentum helps neural network training*

J Chen, A Kyrillidis - 2019

被引用次数：13 相关文章所有 5 个版本

Demon: Momentum decay for improved neural network training

J Chen, C Wolfe, Z Li, A Kyrillidis - 2019

被引用次数：4 相关文章所有 3 个版本