Mask more and mask later: Efficient pre-training of masked language models by disentangling...

A Wettig, T Gao, Z Zhong, D Chen - arXiv preprint arXiv:2202.08005, 2022 - arxiv.org

Masked language models (MLMs) conventionally mask 15% of tokens due to the belief that
more masking would leave insufficient context to learn good representations; this masking …

被引用次数：179 相关文章所有 4 个版本

[PDF] arxiv.org

Emerging property of masked token for effective pre-training

H Choi, H Lee, S Joung, H Park, J Kim… - European Conference on …, 2024 - Springer

Driven by the success of Masked Language Modeling (MLM), the realm of self-supervised
learning for computer vision has been invigorated by the central role of Masked Image …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Parameter-efficient fine-tuning without introducing new latency

B Liao, Y Meng, C Monz - arXiv preprint arXiv:2305.16742, 2023 - arxiv.org

Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently
demonstrated remarkable achievements, effectively matching the performance of full fine …

被引用次数：43 相关文章所有 6 个版本

[PDF] neurips.cc

Make pre-trained model reversible: From parameter to memory efficient fine-tuning

B Liao, S Tan, C Monz - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged
as a highly successful approach, with training only a small number of parameters without …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

Apiq: Finetuning of 2-bit quantized large language model

B Liao, C Herold, S Khadivi, C Monz - arXiv preprint arXiv:2402.05147, 2024 - arxiv.org

Memory-efficient finetuning of large language models (LLMs) has recently attracted huge
attention with the increasing size of LLMs, primarily due to the constraints posed by GPU …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Representation deficiency in masked language modeling

Y Meng, J Krishnan, S Wang, Q Wang, Y Mao… - arXiv preprint arXiv …, 2023 - arxiv.org

Masked Language Modeling (MLM) has been one of the most prominent approaches for
pretraining bidirectional text encoders due to its simplicity and effectiveness. One notable …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

ExLM: Rethinking the Impact of Tokens in Masked Language Models

K Zheng, J Yang, S Liang, B Feng, Z Liu, W Ju… - arXiv preprint arXiv …, 2025 - arxiv.org

Masked Language Models (MLMs) have achieved remarkable success in many self-
supervised representation learning tasks. MLMs are trained by randomly replacing some …

[PDF] hal.science

Domain Adaptation of Named Entity Recognition for Plant Health Monitoring

M Borovikova - 2024 - theses.hal.science

The increasing complexity of agricultural ecosystems and the urgent need for effective plant
health monitoring necessitate advanced technological solutions for processing textual data …

高级搜索

QQ 群