Rate distortion for model compression: From theory to practice

W Gao, YH Liu, C Wang, S Oh - International Conference on …, 2019 - proceedings.mlr.press
The enormous size of modern deep neural net-works makes it challenging to deploy those
models in memory and communication limited scenarios. Thus, compressing a trained …

Model compression as constrained optimization, with application to neural nets. Part V: combining compressions

MÁ Carreira-Perpiñán, Y Idelbayev - arXiv preprint arXiv:2107.04380, 2021 - arxiv.org
Model compression is generally performed by using quantization, low-rank approximation or
pruning, for which various algorithms have been researched in recent years. One …

A flexible, extensible software framework for model compression based on the LC algorithm

Y Idelbayev, MA Carreira-Perpinán - arXiv preprint arXiv:2005.07786, 2020 - arxiv.org
We propose a software framework based on the ideas of the Learning-Compression (LC)
algorithm, that allows a user to compress a neural network or other machine learning model …

Model preserving compression for neural networks

J Chee, A Damle, CM De Sa - Advances in Neural …, 2022 - proceedings.neurips.cc
After training complex deep learning models, a common task is to compress the model to
reduce compute and storage demands. When compressing, it is desirable to preserve the …

[PDF][PDF] A programmable approach to model compression

V Joseph, S Muralidharan, A Garg… - arXiv preprint arXiv …, 2019 - researchgate.net
Deep neural networks frequently contain far more weights, represented at a higher
precision, than are required for the specific task which they are trained to perform …

LC: A flexible, extensible open-source toolkit for model compression

Y Idelbayev, MÁ Carreira-Perpiñán - Proceedings of the 30th ACM …, 2021 - dl.acm.org
The continued increase in memory, runtime and energy consumption of deployed machine
learning models on one side, and the trend to miniaturize intelligent devices and sensors on …

SPDY: Accurate pruning with speedup guarantees

E Frantar, D Alistarh - International Conference on Machine …, 2022 - proceedings.mlr.press
The recent focus on the efficiency of deep neural networks (DNNs) has led to significant
work on model compression approaches, of which weight pruning is one of the most …

Spectral pruning: Compressing deep neural networks via spectral analysis and its generalization error

T Suzuki, H Abe, T Murata, S Horiuchi, K Ito… - arXiv preprint arXiv …, 2018 - arxiv.org
Compression techniques for deep neural network models are becoming very important for
the efficient execution of high-performance deep learning systems on edge-computing …

More general and effective model compression via an additive combination of compressions

Y Idelbayev, MA Carreira-Perpinán - … 13–17, 2021, Proceedings, Part III 21, 2021 - Springer
Abstract Model compression is generally performed by using quantization, low-rank
approximation or pruning, for which various algorithms have been researched in recent …

Going beyond classification accuracy metrics in model compression

V Joseph, SA Siddiqui, A Bhaskara… - arXiv preprint arXiv …, 2020 - arxiv.org
With the rise in edge-computing devices, there has been an increasing demand to deploy
energy and resource-efficient models. A large body of research has been devoted to …