A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Unified data-free compression: Pruning and quantization without fine-tuning

S Bai, J Chen, X Shen, Y Qian… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Structured pruning and quantization are promising approaches for reducing the inference
time and memory footprint of neural networks. However, most existing methods require the …

Towards trustworthy dataset distillation

S Ma, F Zhu, Z Cheng, XY Zhang - Pattern Recognition, 2025 - Elsevier
Efficiency and trustworthiness are two eternal pursuits when applying deep learning in
practical scenarios. Considering efficiency, dataset distillation (DD) endeavors to reduce …

Dual teachers for self-knowledge distillation

Z Li, X Li, L Yang, R Song, J Yang, Z Pan - Pattern Recognition, 2024 - Elsevier
We introduce an efficient self-knowledge distillation framework, Dual Teachers for Self-
Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual …

MCMC: Multi-Constrained Model Compression via One-Stage Envelope Reinforcement Learning

S Li, J Chen, S Liu, C Zhu, G Tian… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Model compression methods are being developed to bridge the gap between the massive
scale of neural networks and the limited hardware resources on edge devices. Since most …

Single-shot pruning and quantization for hardware-friendly neural network acceleration

B Jiang, J Chen, Y Liu - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Applying CNN on embedded systems is challenging due to model size limitations. Pruning
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …

MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization

Y Zhong, Y Zhou, F Chao, R Ji - Pattern Recognition, 2025 - Elsevier
Arbitrary bit-width network quantization has received significant attention due to its high
adaptability to various bit-width requirements during runtime. However, in this paper, we …

PIPE: Parallelized inference through ensembling of residual quantization expansions

E Yvinec, A Dapogny, K Bailly - Pattern Recognition, 2024 - Elsevier
Deep neural networks (DNNs) are ubiquitous in computer vision and natural language
processing, but suffer from high inference cost. This problem can be addressed by …

Dynamic instance-aware layer-bit-select network on human activity recognition using wearable sensors

N Ye, L Zhang, D Cheng, C Bu, S Sun, H Wu… - … Applications of Artificial …, 2024 - Elsevier
During recent years, deep convolutional neural networks have achieved remarkable
success in a wide range of sensor-based human activity recognition (HAR) applications …