An overview of neural network compression

JO Neill - arXiv preprint arXiv:2006.03669, 2020 - arxiv.org
Overparameterized networks trained to convergence have shown impressive performance
in domains such as computer vision and natural language processing. Pushing state of the …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Levit: a vision transformer in convnet's clothing for faster inference

B Graham, A El-Nouby, H Touvron… - Proceedings of the …, 2021 - openaccess.thecvf.com
We design a family of image classification architectures that optimize the trade-off between
accuracy and efficiency in a high-speed regime. Our work exploits recent findings in …

A white paper on neural network quantization

M Nagel, M Fournarakis, RA Amjad… - arXiv preprint arXiv …, 2021 - arxiv.org
While neural networks have advanced the frontiers in many applications, they often come at
a high computational cost. Reducing the power and latency of neural network inference is …

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Cocktailsgd: Fine-tuning foundation models over 500mbps networks

J Wang, Y Lu, B Yuan, B Chen… - International …, 2023 - proceedings.mlr.press
Distributed training of foundation models, especially large language models (LLMs), is
communication-intensive and so has heavily relied on centralized data centers with fast …

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

Up or down? adaptive rounding for post-training quantization

M Nagel, RA Amjad, M Van Baalen… - International …, 2020 - proceedings.mlr.press
When quantizing neural networks, assigning each floating-point weight to its nearest fixed-
point value is the predominant approach. We find that, perhaps surprisingly, this is not the …

SPINN: synergistic progressive inference of neural networks over device and cloud

S Laskaridis, SI Venieris, M Almeida… - Proceedings of the 26th …, 2020 - dl.acm.org
Despite the soaring use of convolutional neural networks (CNNs) in mobile applications,
uniformly sustaining high-performance inference on mobile has been elusive due to the …

Ternarybert: Distillation-aware ultra-low bit bert

W Zhang, L Hou, Y Yin, L Shang, X Chen… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer-based pre-training models like BERT have achieved remarkable performance
in many natural language processing tasks. However, these models are both computation …