Layerwise optimization by gradient decomposition for continual learning

S Tang, D Chen, J Zhu, S Yu… - Proceedings of the …, 2021 - openaccess.thecvf.com
… novel continual learning framework including gradient decomposition, gradient optimization
and layerwise … We observe that optimizing the gradient update layerwise can further help the …

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

J Yoo, Y Liu, F Wood, G Pleiss - arXiv preprint arXiv:2402.09542, 2024 - arxiv.org
… a simple optimizer modification, which maintains the advantages of replay-based continual
learning while improving its stability. Our method, which we refer to as Layerwise Proximal Re…

Trgp: Trust region gradient projection for continual learning

S Lin, L Yang, D Fan, J Zhang - arXiv preprint arXiv:2202.02931, 2022 - arxiv.org
continual learning. However, by modifying the model only in the orthogonal direction to the
input space of old tasks, the optimization space of learning … We next define a layer-wise trust …

Plasticity-optimized complementary networks for unsupervised continual learning

A Gomez-Villa, B Twardowski… - Proceedings of the …, 2024 - openaccess.thecvf.com
learning scenario. This paper aims to apply complementary learning systems theory to improve
continual learning … The existing methods [22, 19] can suffer from suboptimal stability and …

Learning where to learn: Gradient sparsity in meta and continual learning

J Von Oswald, D Zhao, S Kobayashi… - Advances in …, 2021 - proceedings.neurips.cc
… Such interference can be reduced with online meta-learning methods which optimize the
base learning algorithm using both present and past data, kept in a replay buffer [49, 15]. Our …

Forget-free continual learning with winning subnetworks

H Kang, RJL Mina, SRH Madjid… - … Machine Learning, 2022 - proceedings.mlr.press
… a continual learning method referred to as Winning SubNetworks (WSN) which sequentially
learns and selects an optimalcontinual learning, the model should consider the layer-wise

A comprehensive survey of continual learning: theory, method and application

L Wang, X Zhang, H Su, J Zhu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
… a layer-wise manner. GAN-Memory [167] takes advantage of FiLM [201] and AdaFM [202] to
… a wide basin in downstream continual learning by optimizing the flatness metric. SLCA [214] …

Pretrained language model in continual learning: A comparative study

T Wu, M Caccia, Z Li, YF Li, G Qi… - … on Learning …, 2022 - research.monash.edu
… different PLMs with a number of layer-wise probing analyses. … and optimization of better
PLM-oriented continual learning … the layer-wise insights of a specific PLM for continual learning. …

Large batch optimization for deep learning using new complete layer-wise adaptive rate scaling

Z Huo, B Gu, H Huang - Proceedings of the AAAI conference on artificial …, 2021 - ojs.aaai.org
Constant learning rate 0.001 is used for all layers and batch size B = 128. After each
epoch, we approximate the gradient variance factor Mk by computing the ratio of 1 …

Beyond not-forgetting: Continual learning with backward knowledge transfer

S Lin, L Yang, D Fan, J Zhang - Advances in Neural …, 2022 - proceedings.neurips.cc
… analysis, we next develop a ContinUal learning method with Backward … correlated old tasks
in a layer-wise manner, and then … To conclude, the optimization problem for learning the new …