Structured pruning for deep convolutional neural networks: A survey

Y He, L Xiao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

[HTML][HTML] Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Squeezellm: Dense-and-sparse quantization

S Kim, C Hooper, A Gholami, Z Dong, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative Large Language Models (LLMs) have demonstrated remarkable results for a
wide range of tasks. However, deploying these models for inference has been a significant …

LungNet: A hybrid deep-CNN model for lung cancer diagnosis using CT and wearable sensor-based medical IoT data

N Faruqui, MA Yousuf, M Whaiduzzaman… - Computers in Biology …, 2021 - Elsevier
Lung cancer, also known as pulmonary cancer, is one of the deadliest cancers, but yet
curable if detected at the early stage. At present, the ambiguous features of the lung cancer …

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arXiv preprint arXiv …, 2022 - arxiv.org
Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Squant: On-the-fly data-free quantization via diagonal hessian approximation

C Guo, Y Qiu, J Leng, X Gao, C Zhang, Y Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Quantization of deep neural networks (DNN) has been proven effective for compressing and
accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the …

A comprehensive survey on model quantization for deep neural networks in image classification

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

A comprehensive survey on model quantization for deep neural networks

B Rokh, A Azarpeyvand, A Khanteymoori - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in machine learning by deep neural networks are significant. But using
these networks has been accompanied by a huge number of parameters for storage and …