Woodfisher: Efficient second-order approximation for neural network compression

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press

We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

被引用次数：303 相关文章所有 8 个版本

[PDF] arxiv.org

Gptq: Accurate post-training quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, D Alistarh - arXiv preprint arXiv …, 2022 - arxiv.org

Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

被引用次数：431 相关文章所有 19 个版本

[PDF] arxiv.org

A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arXiv preprint arXiv:2306.11695, 2023 - arxiv.org

As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

被引用次数：235 相关文章所有 5 个版本

[PDF] neurips.cc

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

被引用次数：143 相关文章所有 5 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：719 相关文章所有 27 个版本

[PDF] openreview.net

OPTQ: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler… - … Conference on Learning …, 2022 - openreview.net

Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart
through breakthrough performance across complex language modelling tasks, but also by …

被引用次数：133 相关文章所有 2 个版本

[PDF] mlr.press

Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press

Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

被引用次数：188 相关文章所有 8 个版本

[PDF] neurips.cc

Training neural networks with fixed sparse masks

YL Sung, V Nair, CA Raffel - Advances in Neural …, 2021 - proceedings.neurips.cc

During typical gradient-based training of deep neural networks, all of the model's
parameters are updated at each iteration. Recent work has shown that it is possible to …

被引用次数：149 相关文章所有 6 个版本

[PDF] mlr.press

Group fisher pruning for practical network compression

L Liu, S Zhang, Z Kuang, A Zhou… - International …, 2021 - proceedings.mlr.press

Network compression has been widely studied since it is able to reduce the memory and
computation cost during inference. However, previous methods seldom deal with …

被引用次数：132 相关文章所有 7 个版本

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

被引用次数：101 相关文章所有 3 个版本

高级搜索

QQ 群