SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication

J Li, G Tan, M Chen, N Sun - Proceedings of the 34th ACM SIGPLAN …, 2013 - dl.acm.org
J Li, G Tan, M Chen, N Sun
Proceedings of the 34th ACM SIGPLAN conference on Programming language …, 2013dl.acm.org
Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high
performance computing and emerging data-intensive applications. By far, SpMV libraries
are optimized by either application-specific or architecture-specific approaches, making the
libraries become too complicated to be used extensively in real applications. In this work we
develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap
between specific optimizations and general-purpose usage. SMAT provides users with a …
Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. SMAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multigrid solver from Hypre library with above 20% performance improvement reported.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果