Inducing and exploiting activation sparsity for fast inference on deep neural networks

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press

We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

被引用次数：503 相关文章所有 8 个版本

[PDF] mlr.press

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press

Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

被引用次数：250 相关文章所有 7 个版本

[PDF] neurips.cc

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

被引用次数：211 相关文章所有 5 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：858 相关文章所有 27 个版本

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

被引用次数：126 相关文章所有 3 个版本

[PDF] arxiv.org

Relu strikes back: Exploiting activation sparsity in large language models

I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) with billions of parameters have drastically transformed AI
applications. However, their demanding computation during inference has raised significant …

被引用次数：66 相关文章所有 4 个版本

[PDF] arxiv.org

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

被引用次数：79 相关文章所有 4 个版本

[PDF] frontiersin.org

μBrain: An event-driven and fully synthesizable architecture for spiking neural networks

J Stuijt, M Sifalakis, A Yousefzadeh… - Frontiers in …, 2021 - frontiersin.org

The development of brain-inspired neuromorphic computing architectures as a paradigm for
Artificial Intelligence (AI) at the edge is a candidate solution that can meet strict energy and …

被引用次数：133 相关文章所有 9 个版本

[PDF] ustc.edu.cn

An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices

AF Kamara, E Chen, Z Pan - Information Sciences, 2022 - Elsevier

For several years the modeling as well as forecasting of the prices of stocks have been
extremely challenging for the business community and researchers as a result of the …

被引用次数：73 相关文章所有 3 个版本

[PDF] arxiv.org

Nispa: Neuro-inspired stability-plasticity adaptation for continual learning in sparse networks

MB Gurbuz, C Dovrolis - arXiv preprint arXiv:2206.09117, 2022 - arxiv.org

The goal of continual learning (CL) is to learn different tasks over time. The main desiderata
associated with CL are to maintain performance on older tasks, leverage the latter to …

被引用次数：50 相关文章所有 3 个版本

高级搜索

QQ 群