Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y Xie - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

A comprehensive survey on model compression and acceleration

T Choudhary, V Mishra, A Goswami… - Artificial Intelligence …, 2020 - Springer
In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Differentiable soft quantization: Bridging full-precision and low-bit neural networks

R Gong, X Liu, S Jiang, T Li, P Hu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Hardware-friendly network quantization (eg, binary/uniform quantization) can efficiently
accelerate the inference and meanwhile reduce memory consumption of the deep neural …

Pact: Parameterized clipping activation for quantized neural networks

J Choi, Z Wang, S Venkataramani, PIJ Chuang… - arXiv preprint arXiv …, 2018 - arxiv.org
Deep learning algorithms achieve high classification accuracy at the expense of significant
computation cost. To address this cost, a number of quantization schemes have been …

Learning to quantize deep networks by optimizing quantization intervals with task loss

S Jung, C Son, S Lee, J Son, JJ Han… - Proceedings of the …, 2019 - openaccess.thecvf.com
Reducing bit-widths of activations and weights of deep networks makes it efficient to
compute and store them in memory, which is crucial in their deployments to resource-limited …

Accurate and efficient 2-bit quantized neural networks

J Choi, S Venkataramani… - Proceedings of …, 2019 - proceedings.mlsys.org
Deep learning algorithms achieve high classification accuracy at the expense of significant
computation cost. In order to reduce this cost, several quantization schemes have gained …

Compression of deep learning models for text: A survey

M Gupta, P Agrawal - ACM Transactions on Knowledge Discovery from …, 2022 - dl.acm.org
In recent years, the fields of natural language processing (NLP) and information retrieval (IR)
have made tremendous progress thanks to deep learning models like Recurrent Neural …

Adabits: Neural network quantization with adaptive bit-widths

Q Jin, L Yang, Z Liao - … of the IEEE/CVF Conference on …, 2020 - openaccess.thecvf.com
Deep neural networks with adaptive configurations have gained increasing attention due to
the instant and flexible deployment of these models on platforms with different resource …

Energy-efficient neural network accelerator based on outlier-aware low-precision computation

E Park, D Kim, S Yoo - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org
Owing to the presence of large values, which we call outliers, conventional methods of
quantization fail to achieve significantly low precision, eg, four bits, for very deep neural …