Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press
We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org
Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Relu strikes back: Exploiting activation sparsity in large language models

I Mirzadeh, K Alizadeh, S Mehta, CC Del Mundo… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) with billions of parameters have drastically transformed AI
applications. However, their demanding computation during inference has raised significant …

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arXiv preprint arXiv …, 2022 - arxiv.org
This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

μBrain: An event-driven and fully synthesizable architecture for spiking neural networks

J Stuijt, M Sifalakis, A Yousefzadeh… - Frontiers in …, 2021 - frontiersin.org
The development of brain-inspired neuromorphic computing architectures as a paradigm for
Artificial Intelligence (AI) at the edge is a candidate solution that can meet strict energy and …

An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices

AF Kamara, E Chen, Z Pan - Information Sciences, 2022 - Elsevier
For several years the modeling as well as forecasting of the prices of stocks have been
extremely challenging for the business community and researchers as a result of the …

Nispa: Neuro-inspired stability-plasticity adaptation for continual learning in sparse networks

MB Gurbuz, C Dovrolis - arXiv preprint arXiv:2206.09117, 2022 - arxiv.org
The goal of continual learning (CL) is to learn different tasks over time. The main desiderata
associated with CL are to maintain performance on older tasks, leverage the latter to …