作者
Hui Wei, Enjie Liu, Youbing Zhao, Hongqing Yu
发表日期
2020/10/20
研讨会论文
Computer Graphics International Conference
页码范围
411-418
出版商
Springer, Cham
简介
This paper presents an optimized implementation for Winograd non-fused convolution. Our optimizations comprise application-independent grouped producer-consumer chains and a set of Winograd-specific software techniques, including specialized interface-kernels data format which enhances memory access efficiency; warp specialization and double buffer prefetching which effectively exploit computational resources and memory bandwidth; utilizing “shuffle” instruction which conserves hardware resources. The paper also provides supplementary explanation of Winograds’ tile extraction, which saves memory and computing resources.
The proposed techniques has been evaluated head to head by kernel level in GTX 980 GPU, CUDA 9.2 with a wide range of parameters which meet CNN layers benchmark. Compared with the state-of-the-art Winograd Non-fused convolution in CuDnn 7.6.4 (released in Sept …
引用总数
2021202220232024111
学术搜索中的文章
H Wei, E Liu, Y Zhao, H Yu - Advances in Computer Graphics: 37th Computer …, 2020