Differentiable joint pruning and quantization for hardware efficiency

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org

The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

被引用次数：147 相关文章所有 7 个版本

[PDF] arxiv.org

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com

This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

被引用次数：1323 相关文章所有 4 个版本

[PDF] academia.edu

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

被引用次数：71 相关文章

[PDF] neurips.cc

Ptqd: Accurate post-training quantization for diffusion models

Y He, L Liu, J Liu, W Wu, H Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc

Diffusion models have recently dominated image synthesis and other related generative
tasks. However, the iterative denoising process is expensive in computations at inference …

被引用次数：72 相关文章所有 5 个版本

[PDF] arxiv.org

Qllm: Accurate and efficient low-bitwidth quantization for large language models

J Liu, R Gong, X Wei, Z Dong, J Cai… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread
deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive …

被引用次数：53 相关文章所有 4 个版本

[PDF] thecvf.com

Automatic network pruning via hilbert-schmidt independence criterion lasso under information bottleneck principle

S Guo, L Zhang, X Zheng, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Most existing neural network pruning methods hand-crafted their importance criteria and
structures to prune. This constructs heavy and unintended dependencies on heuristics and …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：76 相关文章所有 3 个版本

[PDF] usenix.org

{SparTA}:{Deep-Learning} Model Sparsity via {Tensor-with-Sparsity-Attribute}

N Zheng, B Lin, Q Zhang, L Ma, Y Yang… - … USENIX Symposium on …, 2022 - usenix.org

Sparsity is becoming arguably the most critical dimension to explore for efficiency and
scalability, as deep learning models grow significantly larger and more complex. After all …

被引用次数：44 相关文章所有 2 个版本

[PDF] thecvf.com

Hessian-aware pruning and optimal neural implant

S Yu, Z Yao, A Gholami, Z Dong… - Proceedings of the …, 2022 - openaccess.thecvf.com

Pruning is an effective method to reduce the memory footprint and FLOPs associated with
neural network models. However, existing structured pruning methods often result in …

被引用次数：67 相关文章所有 8 个版本

[PDF] thecvf.com

Improving low-precision network quantization via bin regularization

T Han, D Li, J Liu, L Tian… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Abstract Model quantization is an important mechanism for energy-efficient deployment of
deep neural networks on resource-constrained devices by reducing the bit precision of …

被引用次数：41 相关文章所有 3 个版本

高级搜索

QQ 群