查看文章

github.io 中的 [PDF]

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

作者

Hongwu Peng, Shaoyi Huang, Tong Geng, Ang Li, Weiwen Jiang, Hang Liu, Shusen Wang, Caiwen Ding

发表日期

2021/4/7

研讨会论文

2021 22nd International Symposium on Quality Electronic Design (ISQED)

页码范围

142-148

出版商

IEEE

简介

Although Transformer-based language representations achieve state-of-the-art accuracy on various natural language processing (NLP) tasks, the large model size has been challenging the resource constrained computing platforms. Weight pruning, as a popular and effective technique in reducing the number of weight parameters and accelerating the Transformer, has been investigated on GPUs. However, the Transformer acceleration using weight pruning on field-programmable gate array (FPGAs) remains unexplored. This paper investigates the column balanced block-wise pruning on Transformer and designs an FPGA acceleration engine to customize the balanced blockwise matrix multiplication. We implement the Transformer model with proper hardware scheduling, and the experiments show that the Transformer inference on FPGA achieves 10.35 ms latency with the batch size of 32, which is 10.96 …

引用总数

被引用次数：78

20212022202320243 26 31 18

学术搜索中的文章

Accelerating transformer-based deep learning models on fpgas using column balanced block pruning

H Peng, S Huang, T Geng, A Li, W Jiang, H Liu… - 2021 22nd International Symposium on Quality …, 2021

被引用次数：78 相关文章所有 6 个版本