Implementing LU and Cholesky factorizations on artificial intelligence accelerators

Y Lu, Y Luo, H Lian, Z Jin, W Liu - CCF Transactions on High Performance …, 2021 - Springer
Y Lu, Y Luo, H Lian, Z Jin, W Liu
CCF Transactions on High Performance Computing, 2021Springer
LU and Cholesky factorizations for dense matrices are one of the most fundamental building
blocks in a number of numerical applications. Because of the O (n 3) complexity, they may
be the most time consuming basic kernels in numerical linear algebra. For this reason,
accelerating them on a variety of modern parallel processors received much attention. We in
this paper implement LU and Cholesky factorizations on novel massively parallel artificial
intelligence (AI) accelerators originally developed for deep neural network applications. We …
Abstract
LU and Cholesky factorizations for dense matrices are one of the most fundamental building blocks in a number of numerical applications. Because of the complexity, they may be the most time consuming basic kernels in numerical linear algebra. For this reason, accelerating them on a variety of modern parallel processors received much attention. We in this paper implement LU and Cholesky factorizations on novel massively parallel artificial intelligence (AI) accelerators originally developed for deep neural network applications. We explore data parallelism of the matrix factorizations, and exploit neural compute units and on-chip scratchpad memories of modern AI chips for accelerating them. The experimental results show that our various optimization methods bring performance improvements and can provide up to 41.54 and 19.77 GFlop/s performance using single precision data type and 78.37 and 33.85 GFlop/s performance using half precision data type for LU and Cholesky factorizations on a Cambricon AI accelerator, respectively.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果