Design and implementation of 2D convolution on x86/x64 processors- 学术资源搜索

Design and implementation of 2D convolution on x86/x64 processors

V Kelefouras, G Keramidas - IEEE Transactions on Parallel …, 2022 - ieeexplore.ieee.org

IEEE Transactions on Parallel and Distributed Systems, 2022•ieeexplore.ieee.org

In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from to speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from to speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from to speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.

ieeexplore.ieee.org

展开收起

被引用次数：9 相关文章所有 3 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Design and implementation of 2D convolution on x86/x64 processors

引用