Challenging common assumptions about catastrophic forgetting and knowledge accumulation

A Ibrahim, B Thérien, K Gupta, ML Richter… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start
the process over again once new data becomes available. A much more efficient solution is …

被引用次数：53 相关文章所有 2 个版本

[PDF] arxiv.org

Addressing loss of plasticity and catastrophic forgetting in continual learning

M Elsayed, AR Mahmood - arXiv preprint arXiv:2404.00781, 2024 - arxiv.org

Deep representation learning methods struggle with continual learning, suffering from both
catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful …

被引用次数：17 相关文章所有 3 个版本

[PDF] google.com

Continual learning under language shift

E Gogoulou, T Lesort, M Boman, J Nivre - International Conference on …, 2024 - Springer

The recent increase in data and model scale for language model pre-training has led to
huge training costs. In scenarios where new data become available over time, updating a …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Utility-based perturbed gradient descent: An optimizer for continual learning

M Elsayed, AR Mahmood - arXiv preprint arXiv:2302.03281, 2023 - arxiv.org

Modern representation learning methods often struggle to adapt quickly under non-
stationarity because they suffer from catastrophic forgetting and decaying plasticity. Such …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Knowledge accumulation in continually learned representations and the issue of feature forgetting

T Hess, E Verwimp, GM van de Ven… - arXiv preprint arXiv …, 2023 - arxiv.org

Continual learning research has shown that neural networks suffer from catastrophic
forgetting" at the output level", but it is debated whether this is also the case at the level of …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations

X Jin, X Ren - arXiv preprint arXiv:2406.14026, 2024 - arxiv.org

Language models (LMs) are known to suffer from forgetting of previously learned examples
when fine-tuned, breaking stability of deployed LM systems. Despite efforts on mitigating …

被引用次数：1 相关文章

[PDF] umontreal.ca

The shifting landscape of data: learning to tame distributional shifts

A Ibrahim - 2024 - papyrus.bib.umontreal.ca

Machine learning (ML) models achieve remarkable performance on tasks they are trained
for. However, they often are sensitive to shifts in the data distribution, which may lead to …

[PDF] openreview.net

Demystifying Language Model Forgetting with Low-Rank Example Associations

X Jin, X Ren - NeurIPS 2024 Workshop on Scalable Continual … - openreview.net

Large Language models (LLMs) suffer from forgetting of upstream data when fine-tuned.
Despite efforts on mitigating forgetting, few have investigated whether, and how forgotten …

高级搜索

QQ 群