查看文章

arxiv.org 中的 [PDF]

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

作者

Mengjie Zhao*, Tao Lin*, Martin Jaggi, Hinrich Schütze

发表日期

2020/4/26

期刊

EMNLP 2020 - Empirical Methods in Natural Language Processing

简介

We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

引用总数

被引用次数：95

202020212022202320242 12 25 28 28

学术搜索中的文章

Masking as an efficient alternative to finetuning for pretrained language models

M Zhao, T Lin, F Mi, M Jaggi, H Schütze - arXiv preprint arXiv:2004.12406, 2020

被引用次数：95 相关文章所有 5 个版本