查看文章

Exploring the performance bound of cambricon accelerator in end-to-end inference scenario

作者

Yifan Wang, Chundian Li, Chen Zeng

发表日期

2019/11/14

图书

International Symposium on Benchmarking, Measuring and Optimization

页码范围

67-74

出版商

Springer International Publishing

简介

Deep learning algorithms have become pervasive in a broad range of industrial application scenarios. DianNao/Cambricon family is a set of energy-efficient hardware accelerators for machine learning, especially for deep learning, covering from edge embedded devices to cloud data centers. However, in the real application scenario, the complicated software stack and the extra overhead (memory copy) hinder the full exploitation of the accelerator performance. In this paper, we try to explore the performance bound of Cambricon accelerator MLU100 in end-to-end deep learning inference scenarios (from data/model load to inference results store). We leverage the offline model to bypass the general deep learning framework, use the multiple threads programming to fully exploit the parallelism of the multi-core accelerator and apply specific data structure to decrease the memory copy overhead. The evaluation results show …

引用总数

被引用次数：3

201920201 2

学术搜索中的文章

Exploring the performance bound of cambricon accelerator in end-to-end inference scenario

Y Wang, C Li, C Zeng - International Symposium on Benchmarking, Measuring …, 2019

被引用次数：3 相关文章所有 2 个版本