Towards efficient post-training quantization of pre-trained language models

H Bai, L Hou, L Shang, X Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc
Network quantization has gained increasing attention with the rapid growth of large pre-
trained language models~(PLMs). However, most existing quantization methods for PLMs …

Extreme compression of large language models via additive quantization

V Egiazarian, A Panferov, D Kuznedelev… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of accurate open large language models (LLMs) has led to a race towards
quantization techniques for such models enabling execution on end-user devices. In this …

Transform quantization for CNN compression

SI Young, W Zhe, D Taubman… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this paper, we compress convolutional neural network (CNN) weights post-training via
transform quantization. Previous CNN quantization techniques tend to ignore the joint …

Few shot network compression via cross distillation

H Bai, J Wu, I King, M Lyu - Proceedings of the AAAI Conference on …, 2020 - aaai.org
Abstract Model compression has been widely adopted to obtain light-weighted deep neural
networks. Most prevalent methods, however, require fine-tuning with sufficient training data …

FPFS: Filter-level pruning via distance weight measuring filter similarity

W Zhang, Z Wang - Neurocomputing, 2022 - Elsevier
Abstract Deep Neural Networks (DNNs) enjoy the welfare of convolution, while also bearing
huge computational pressure. Therefore, model compression techniques are used to …

MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation

R Zhang, ACS Chung - Medical Image Analysis, 2021 - Elsevier
Implementing deep convolutional neural networks (CNNs) with boolean arithmetic is ideal
for eliminating the notoriously high computational expense of deep learning models …

Heterogeneous model fusion federated learning mechanism based on model mapping

X Lu, Y Liao, C Liu, P Lio, P Hui - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org
The computing power of various Internet of Things (IoT) devices is quite different. To enable
IoT devices with lower computing power to perform machine learning, all nodes can only …

Fixed-point back-propagation training

X Zhang, S Liu, R Zhang, C Liu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Recent emerged quantization technique (ie, using low bit-width fixed-point data instead of
high bit-width floating-point data) has been applied to inference of deep neural networks for …

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

M Cao, L Wang, H Wang, X Yuan - arXiv preprint arXiv:2407.21517, 2024 - arxiv.org
Video Snapshot Compressive Imaging (SCI) aims to use a low-speed 2D camera to capture
high-speed scene as snapshot compressed measurements, followed by a reconstruction …

Towards efficient network compression via Few-Shot Slimming

J He, Y Ding, M Zhang, D Li - Neural Networks, 2022 - Elsevier
While previous network compression methods achieve great success, most of them rely on
the abundant training data which is, unfortunately, often unavailable in practice due to some …