Accelerating sparse lu factorization with density-aware adaptive matrix multiplication for circuit simulation

T Wang, W Li, H Pei, Y Sun, Z Jin… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org
T Wang, W Li, H Pei, Y Sun, Z Jin, W Liu
2023 60th ACM/IEEE Design Automation Conference (DAC), 2023ieeexplore.ieee.org
Sparse LU factorization is considered to be one of the most time-consuming components in
circuit simulation, particularly when dealing with circuits of considerable size in the
advanced process era. Sparse LU factorization can be expedited by utilizing the supernode
structure, which partitions the matrix into dense sub-matrices, thereby improving
computational performance by utilizing level-3 Basic Linear Algebra Subprograms (BLAS)
General Matrix Multiplication (GEMM) operations. The sparse and irregular structure of …
Sparse LU factorization is considered to be one of the most time-consuming components in circuit simulation, particularly when dealing with circuits of considerable size in the advanced process era. Sparse LU factorization can be expedited by utilizing the supernode structure, which partitions the matrix into dense sub-matrices, thereby improving computational performance by utilizing level-3 Basic Linear Algebra Subprograms (BLAS) General Matrix Multiplication (GEMM) operations. The sparse and irregular structure of circuit matrices often impedes the formation of supernodes or results in the formation of supernodes with many zero elements, which in turn poses challenges for exploiting GEMM operations. In this paper, by fully utilizing the density in sub-matrices and combining GEMM with the Dense-Sparse Matrix Multiplication (SpMM), we propose a density-aware adaptive matrix multiplication equipped with machine learning techniques to optimize performance of the most-time consuming matrix multiplication operator so as to accelerate the sparse LU factorization. Numerical experiment results show that among the 6 circuit matrices tested, the average performance of matrix multiplication in our algorithm can be improved by 5.35x (up to 9.35x) compared to the performance of using GEMM directly in Schur-complement updates. Compared with state-of-the-art solver SuperLU_DIST, our method shows a substantial performance improvement.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果