A survey on deep neural network pruning-taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - arXiv preprint arXiv:2308.06767, 2023 - arxiv.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead

M Capra, B Bussolino, A Marchisio, G Masera… - IEEE …, 2020 - ieeexplore.ieee.org
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning
(DL) is already present in many applications ranging from computer vision for medicine to …

EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM

S Koppula, L Orosa, AG Yağlıkçı, R Azizi… - Proceedings of the …, 2019 - dl.acm.org
The effectiveness of deep neural networks (DNN) in vision, speech, and language
processing has prompted a tremendous demand for energy-efficient high-performance DNN …

Dsa: More efficient budgeted pruning via differentiable sparsity allocation

X Ning, T Zhao, W Li, P Lei, Y Wang, H Yang - European Conference on …, 2020 - Springer
Budgeted pruning is the problem of pruning under resource constraints. In budgeted
pruning, how to distribute the resources across layers (ie, sparsity allocation) is the key …

Model compression with adversarial robustness: A unified optimization framework

S Gui, H Wang, H Yang, C Yu… - Advances in Neural …, 2019 - proceedings.neurips.cc
Deep model compression has been extensively studied, and state-of-the-art methods can
now achieve high compression ratios with minimal accuracy loss. This paper studies model …

Gdp: Stabilized neural network pruning via gates with differentiable polarization

Y Guo, H Yuan, J Tan, Z Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Model compression techniques are recently gaining explosive attention for
obtaining efficient AI models for various real time applications. Channel pruning is one …

Dual-side sparse tensor core

Y Wang, C Zhang, Z Xie, C Guo, Y Liu… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Leveraging sparsity in deep neural network (DNN) models is promising for accelerating
model inference. Yet existing GPUs can only leverage the sparsity from weights but not …

Accelerating sparse dnn models without hardware-support via tile-wise sparsity

C Guo, BY Hsueh, J Leng, Y Qiu… - … Conference for High …, 2020 - ieeexplore.ieee.org
Network pruning can reduce the high computation cost of deep neural network (DNN)
models. However, to maintain their accuracies, sparse models often carry randomly …

Fire together wire together: A dynamic pruning approach with self-supervised mask prediction

S Elkerdawy, M Elhoushi, H Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Dynamic model pruning is a recent direction that allows for the inference of a different sub-
network for each input sample during deployment. However, current dynamic methods rely …

Gan slimming: All-in-one gan compression by a unified optimization framework

H Wang, S Gui, H Yang, J Liu, Z Wang - European Conference on …, 2020 - Springer
Generative adversarial networks (GANs) have gained increasing popularity in various
computer vision applications, and recently start to be deployed to resource-constrained …