作者
Vitor Finotti, Bruno Albertini
发表日期
2021/10/1
期刊
Computers and Electrical Engineering
卷号
95
页码范围
107446
出版商
Pergamon
简介
Mobile and embedded applications of convolutional neural networks (CNNs) use quantization to reduce model size and increase computational efficiency. However, working with quantized networks often implies using non-standard training and execution methods, as modern frameworks offer limited support to fixed-point operations. We propose a quantization approach simulating the effects of quantization in CNN inference without needing to be re-implemented using fixed-point arithmetic, reducing overhead and complexity in evaluating existing networks’ responses to quantization. The proposed method provides a fast way of performing post-training quantization with different bit widths in activations and weights. Our experimental results on ImageNet CNNs show a model size reduction of more than 50%, while maintaining classification accuracy without a need for retraining. We also measured the relationship …
引用总数
学术搜索中的文章