Implementing LU and Cholesky factorizations on artificial intelligence accelerators- 学术资源搜索

Implementing LU and Cholesky factorizations on artificial intelligence accelerators

Y Lu, Y Luo, H Lian, Z Jin, W Liu - CCF Transactions on High Performance …, 2021 - Springer

CCF Transactions on High Performance Computing, 2021•Springer

LU and Cholesky factorizations for dense matrices are one of the most fundamental building
blocks in a number of numerical applications. Because of the O (n 3) complexity, they may
be the most time consuming basic kernels in numerical linear algebra. For this reason,
accelerating them on a variety of modern parallel processors received much attention. We in
this paper implement LU and Cholesky factorizations on novel massively parallel artificial
intelligence (AI) accelerators originally developed for deep neural network applications. We …

Abstract

LU and Cholesky factorizations for dense matrices are one of the most fundamental building blocks in a number of numerical applications. Because of the complexity, they may be the most time consuming basic kernels in numerical linear algebra. For this reason, accelerating them on a variety of modern parallel processors received much attention. We in this paper implement LU and Cholesky factorizations on novel massively parallel artificial intelligence (AI) accelerators originally developed for deep neural network applications. We explore data parallelism of the matrix factorizations, and exploit neural compute units and on-chip scratchpad memories of modern AI chips for accelerating them. The experimental results show that our various optimization methods bring performance improvements and can provide up to 41.54 and 19.77 GFlop/s performance using single precision data type and 78.37 and 33.85 GFlop/s performance using half precision data type for LU and Cholesky factorizations on a Cambricon AI accelerator, respectively.

Springer

展开收起

被引用次数：2 相关文章所有 4 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Implementing LU and Cholesky factorizations on artificial intelligence accelerators

引用