查看文章

Run-Time Efficient RNN Compression for Inference on Edge Devices

作者

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Mattina Matthew

发表日期

2019/6/23

研讨会论文

4th edition of Workshop on Energy Efficient Machine Learning and Cognitive Computing, co-located with 46th rnational Symposium on Computer Architecture (ISCA)

简介

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. HMD creates dense matrices that results in output features where the upper sub-vector has "richer" features while the lower sub vector has "constrained" features". On the benchmarks evaluated in this paper, this results in faster inference runtime than pruning and better accuracy than matrix factorization for compression factors of 2-4x.

引用总数

被引用次数：29

2019202020212022202320244 10 7 2 2 4

学术搜索中的文章

Run-time efficient RNN compression for inference on edge devices

U Thakker, J Beu, D Gope, G Dasika, M Mattina - 2019 2nd Workshop on Energy Efficient Machine …, 2019

被引用次数：29 相关文章所有 7 个版本