作者
Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, Yanzhi Wang
发表日期
2019/4/4
研讨会论文
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
页码范围
925-938
出版商
ACM
简介
Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior works are dedicated to model compression techniques. The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency …
引用总数
2018201920202021202220232024194253463412
学术搜索中的文章
A Ren, T Zhang, S Ye, J Li, W Xu, X Qian, X Lin… - Proceedings of the Twenty-Fourth International …, 2019