Masking as an efficient alternative to finetuning for pretrained language models

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org

Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

被引用次数：805 相关文章所有 5 个版本

[PDF] arxiv.org

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

L Xu, H Xie, SZJ Qin, X Tao, FL Wang - arXiv preprint arXiv:2312.12148, 2023 - arxiv.org

With the continuous growth in the number of parameters of transformer-based pretrained
language models (PLMs), particularly the emergence of large language models (LLMs) with …

被引用次数：79 相关文章所有 2 个版本

[HTML] nature.com

[HTML][HTML] Parameter-efficient fine-tuning of large-scale pre-trained language models

N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su… - Nature Machine …, 2023 - nature.com

With the prevalence of pre-trained language models (PLMs) and the pre-training–fine-tuning
paradigm, it has been continuously shown that larger models tend to yield better …

被引用次数：426 相关文章所有 2 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3761 相关文章所有 2 个版本

[PDF] arxiv.org

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive
adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining …

被引用次数：201 相关文章所有 6 个版本

[PDF] arxiv.org

Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models

EB Zaken, S Ravfogel, Y Goldberg - arXiv preprint arXiv:2106.10199, 2021 - arxiv.org

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a
subset of them) are being modified. We show that with small-to-medium training data …

被引用次数：853 相关文章所有 5 个版本

[PDF] neurips.cc

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc

During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

被引用次数：154 相关文章所有 6 个版本

[PDF] arxiv.org

Parameter-efficient transfer learning with diff pruning

D Guo, AM Rush, Y Kim - arXiv preprint arXiv:2012.07463, 2020 - arxiv.org

While task-specific finetuning of pretrained networks has led to significant empirical
advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task …

被引用次数：345 相关文章所有 7 个版本

[PDF] openreview.net

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

被引用次数：145 相关文章所有 5 个版本

[PDF] arxiv.org

Modular deep learning

J Pfeiffer, S Ruder, I Vulić, EM Ponti - arXiv preprint arXiv:2302.11529, 2023 - arxiv.org

Transfer learning has recently become the dominant paradigm of machine learning. Pre-
trained models fine-tuned for downstream tasks achieve better performance with fewer …

被引用次数：84 相关文章所有 5 个版本

高级搜索

QQ 群