Super tickets in pre-trained language models: From model compression to improving generalization

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

被引用次数：108 相关文章所有 2 个版本

[PDF] arxiv.org

Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org

Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

被引用次数：1068 相关文章所有 5 个版本

[PDF] arxiv.org

AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning

Q Zhang, M Chen, A Bukharin… - arXiv preprint arXiv …, 2023 - arxiv.org

Fine-tuning large pre-trained language models on downstream tasks has become an
important paradigm in NLP. However, common practice fine-tunes all of the parameters in a …

被引用次数：421 相关文章所有 4 个版本

[PDF] mlr.press

Losparse: Structured compression of large language models based on low-rank and sparse approximation

Y Li, Y Yu, Q Zhang, C Liang, P He… - International …, 2023 - proceedings.mlr.press

Transformer models have achieved remarkable results in various natural language tasks,
but they are often prohibitively large, requiring massive memories and computational …

被引用次数：67 相关文章所有 7 个版本

[PDF] neurips.cc

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

被引用次数：74 相关文章所有 6 个版本

[PDF] mlr.press

Task-specific skill localization in fine-tuned language models

A Panigrahi, N Saunshi, H Zhao… - … on Machine Learning, 2023 - proceedings.mlr.press

Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-
shot settings. Thus fine-tuning allows the model to quickly pick up task-specific" skills," but …

被引用次数：66 相关文章所有 7 个版本

[PDF] mlr.press

Platon: Pruning large transformer models with upper confidence bound of weight importance

Q Zhang, S Zuo, C Liang, A Bukharin… - International …, 2022 - proceedings.mlr.press

Large Transformer-based models have exhibited superior performance in various natural
language processing and computer vision tasks. However, these models contain enormous …

被引用次数：85 相关文章所有 4 个版本

[PDF] arxiv.org

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arXiv preprint arXiv …, 2022 - arxiv.org

The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

被引用次数：60 相关文章所有 7 个版本

[PDF] arxiv.org

Edge AI: A taxonomy, systematic review and future directions

SS Gill, M Golec, J Hu, M Xu, J Du, H Wu, GK Walia… - Cluster …, 2025 - Springer

Abstract Edge Artificial Intelligence (AI) incorporates a network of interconnected systems
and devices that receive, cache, process, and analyse data in close communication with the …

被引用次数：13 相关文章所有 10 个版本

[PDF] arxiv.org

Structured pruning of self-supervised pre-trained models for speech recognition and understanding

Y Peng, K Kim, F Wu, P Sridhar… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Self-supervised speech representation learning (SSL) has shown to be effective in various
downstream tasks, but SSL models are usually large and slow. Model compression …

被引用次数：34 相关文章所有 5 个版本

高级搜索

QQ 群