Sparsegpt: Massive language models can be accurately pruned in one-shot- 学术资源搜索

文章

学术资源搜索

我的图书馆

[PDF] mlr.press

Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press

E Frantar, D Alistarh

International Conference on Machine Learning, 2023•proceedings.mlr.press

Abstract

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2: 4 and 4: 8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github. com/IST-DASLab/sparsegpt.

proceedings.mlr.press

展开收起

被引用次数：494 相关文章所有 8 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Sparsegpt: Massive language models can be accurately pruned in one-shot

引用