Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press
We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

Gptq: Accurate post-training quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, D Alistarh - arXiv preprint arXiv …, 2022 - arxiv.org
Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arXiv preprint arXiv:2306.11695, 2023 - arxiv.org
As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

OPTQ: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler… - … Conference on Learning …, 2022 - openreview.net
Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press
Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc
During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

Group fisher pruning for practical network compression

L Liu, S Zhang, Z Kuang, A Zhou… - International …, 2021 - proceedings.mlr.press
Network compression has been widely studied since it is able to reduce the memory and
computation cost during inference. However, previous methods seldom deal with …

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org
Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …