Design and implementation of 2D convolution on x86/x64 processors

V Kelefouras, G Keramidas - IEEE Transactions on Parallel …, 2022 - ieeexplore.ieee.org
IEEE Transactions on Parallel and Distributed Systems, 2022ieeexplore.ieee.org
In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64
processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-
twiddling optimizations, the optimization of the division operation, multi-threading using
OpenMP, register blocking and the shortest possible bit-width value of the intermediate
results. The proposed method, which is provided as open-source, is general and can be
applied to other processor families too, eg, Arm. The proposed method has been evaluated …
In this paper, a new method for accelerating the 2D direct Convolution operation on x86/x64 processors is presented. It includes efficient vectorization by using SIMD intrinsics, bit-twiddling optimizations, the optimization of the division operation, multi-threading using OpenMP, register blocking and the shortest possible bit-width value of the intermediate results. The proposed method, which is provided as open-source, is general and can be applied to other processor families too, e.g., Arm. The proposed method has been evaluated on two different multi-core Intel CPUs, by using twenty different image sizes, 8-bit integer computations and the most commonly used kernel sizes (3x3, 5x5, 7x7, 9x9). It achieves from to speedup over the Intel IPP library (OpenCV GaussianBlur and Filter2D routines), from to speedup over the gemm-based convolution method (by using Intel MKL int8 matrix multiplication routine), and from to speedup over the vslsConvExec Intel MKL direct convolution routine. The proposed method is superior as it achieves far fewer arithmetical and load/store instructions.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果