Structured pruning for deep convolutional neural networks: A survey

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

Ptqd: Accurate post-training quantization for diffusion models

Y He, L Liu, J Liu, W Wu, H Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have recently dominated image synthesis and other related generative
tasks. However, the iterative denoising process is expensive in computations at inference …

Qllm: Accurate and efficient low-bitwidth quantization for large language models

J Liu, R Gong, X Wei, Z Dong, J Cai… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread
deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …

Automatic network pruning via hilbert-schmidt independence criterion lasso under information bottleneck principle

S Guo, L Zhang, X Zheng, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most existing neural network pruning methods hand-crafted their importance criteria and
structures to prune. This constructs heavy and unintended dependencies on heuristics and …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

{SparTA}:{Deep-Learning} Model Sparsity via {Tensor-with-Sparsity-Attribute}

N Zheng, B Lin, Q Zhang, L Ma, Y Yang… - … USENIX Symposium on …, 2022 - usenix.org
Sparsity is becoming arguably the most critical dimension to explore for efficiency and
scalability, as deep learning models grow significantly larger and more complex. After all …

Hessian-aware pruning and optimal neural implant

S Yu, Z Yao, A Gholami, Z Dong… - Proceedings of the …, 2022 - openaccess.thecvf.com
Pruning is an effective method to reduce the memory footprint and FLOPs associated with
neural network models. However, existing structured pruning methods often result in …

Improving low-precision network quantization via bin regularization

T Han, D Li, J Liu, L Tian… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …