Structured pruning for deep convolutional neural networks: A survey

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

Computational complexity evaluation of neural network applications in signal processing

P Freire, S Srivallapanondh, A Napoli… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we provide a systematic approach for assessing and comparing the
computational complexity of neural network layers in digital signal processing. We provide …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arXiv preprint arXiv …, 2021 - arxiv.org
While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

Hawq-v3: Dyadic neural network quantization

Z Yao, Z Dong, Z Zheng, A Gholami… - International …, 2021 - proceedings.mlr.press
Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …

Pruning vs quantization: Which is better?

A Kuzmin, M Nagel, M Van Baalen… - Advances in neural …, 2023 - proceedings.neurips.cc
Neural network pruning and quantization techniques are almost as old as neural networks
themselves. However, to date, only ad-hoc comparisons between the two have been …

Coin: Compression with implicit neural representations

E Dupont, A Goliński, M Alizadeh, YW Teh… - arXiv preprint arXiv …, 2021 - arxiv.org
We propose a new simple approach for image compression: instead of storing the RGB
values for each pixel of an image, we store the weights of a neural network overfitted to the …

Understanding and overcoming the challenges of efficient transformer quantization

Y Bondarenko, M Nagel, T Blankevoort - arXiv preprint arXiv:2109.12948, 2021 - arxiv.org
Transformer-based architectures have become the de-facto standard models for a wide
range of Natural Language Processing tasks. However, their memory footprint and high …

Only train once: A one-shot neural network training and pruning framework

T Chen, B Ji, T Ding, B Fang, G Wang… - Advances in …, 2021 - proceedings.neurips.cc
Structured pruning is a commonly used technique in deploying deep neural networks
(DNNs) onto resource-constrained devices. However, the existing pruning methods are …