A comprehensive survey on model compression and acceleration

T Choudhary, V Mishra, A Goswami… - Artificial Intelligence …, 2020 - Springer
In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …

FPGA-based accelerators of deep learning networks for learning and classification: A review

A Shawahna, SM Sait, A El-Maleh - ieee Access, 2018 - ieeexplore.ieee.org
Due to recent advances in digital technologies, and availability of credible data, an area of
artificial intelligence, deep learning, has emerged and has demonstrated its ability and …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Post-training quantization for vision transformer

Z Liu, Y Wang, K Han, W Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Recently, transformer has achieved remarkable performance on a variety of computer vision
applications. Compared with mainstream convolutional neural networks, vision transformers …

Learned step size quantization

SK Esser, JL McKinstry, D Bablani… - arXiv preprint arXiv …, 2019 - arxiv.org
Deep networks run with low precision operations at inference time offer power and space
advantages over high precision alternatives, but need to overcome the challenge of …

Finn: A framework for fast, scalable binarized neural network inference

Y Umuroglu, NJ Fraser, G Gambardella… - Proceedings of the …, 2017 - dl.acm.org
Research has shown that convolutional neural networks contain significant redundancy, and
high classification accuracy can be obtained even when weights and activations are …

Improving neural network quantization without retraining using outlier channel splitting

R Zhao, Y Hu, J Dotzel, C De Sa… - … conference on machine …, 2019 - proceedings.mlr.press
Quantization can improve the execution latency and energy efficiency of neural networks on
both commodity GPUs and specialized accelerators. The majority of existing literature …

Pfgm++: Unlocking the potential of physics-inspired generative models

Y Xu, Z Liu, Y Tian, S Tong… - International …, 2023 - proceedings.mlr.press
We introduce a new family of physics-inspired generative models termed PFGM++ that
unifies diffusion models and Poisson Flow Generative Models (PFGM). These models …

Structured pruning of deep convolutional neural networks

S Anwar, K Hwang, W Sung - ACM Journal on Emerging Technologies in …, 2017 - dl.acm.org
Real-time application of deep learning algorithms is often hindered by high computational
complexity and frequent memory accesses. Network pruning is a promising technique to …

Lsq+: Improving low-bit quantization through learnable offsets and better initialization

Y Bhalgat, J Lee, M Nagel… - Proceedings of the …, 2020 - openaccess.thecvf.com
Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that are frequently
employed in popular efficient architectures can also result in negative activation values, with …