Automated performance modeling of HPC applications using machine learning

J Sun, G Sun, S Zhan, J Zhang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Automated performance modeling and performance prediction of parallel programs are
highly valuable in many use cases, such as in guiding task management and job …

Performance analysis of sparse matrix-vector multiplication (SpMV) on graphics processing units (GPUs)

S AlAhmadi, T Mohammed, A Albeshri, I Katib… - Electronics, 2020 - mdpi.com
Graphics processing units (GPUs) have delivered a remarkable performance for a variety of
high performance computing (HPC) applications through massive parallelism. One such …

VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors

Y Li, P Xie, X Chen, J Liu, B Yang, S Li, C Gong… - The Journal of …, 2020 - Springer
Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of
solving problems in numerous applications, but its performance of SpMV is limited by the …

Adaptive SpMV/SpMSpV on GPUs for input vectors of varied sparsity

M Li, Y Ao, C Yang - IEEE Transactions on Parallel and …, 2020 - ieeexplore.ieee.org
Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector
Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse …

Sptfs: Sparse tensor format selection for mttkrp via deep learning

Q Sun, Y Liu, M Dun, H Yang, Z Luan… - … Conference for High …, 2020 - ieeexplore.ieee.org
Canonical polyadic decomposition (CPD) is one of the most common tensor computations
adopted in many scientific applications. The major bottleneck of CPD is matricized tensor …

Leveraging one-sided communication for sparse triangular solvers

N Ding, S Williams, Y Liu, XS Li - Proceedings of the 2020 SIAM Conference …, 2020 - SIAM
In this paper, we implement and evaluate a one-sided communication-based distributed-
memory sparse triangular solve (SpTRSV). SpTRSV is used in conjunction with Sparse LU …

Synergistic CPU-FPGA acceleration of sparse linear algebra

M Soltaniyeh, RP Martin, S Nagarakatte - arXiv preprint arXiv:2004.13907, 2020 - arxiv.org
This paper describes REAP, a software-hardware approach that enables high performance
sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully …

A fully structure-driven performance analysis of sparse matrix-vector multiplication

P Sandhu, C Verbrugge, L Hendren - Proceedings of the ACM/SPEC …, 2020 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is an important kernel in many scientific, machine-
learning, and other compute-intensive applications. Performance characteristics, however …

MMSparse: 2D partitioning of sparse matrix based on mathematical morphology

Z Tan, W Ji, J Gao, Y Zhao, A Benatia, Y Wang… - Future Generation …, 2020 - Elsevier
Sparse matrix is any matrix with enough zeros that it pays to take advantage of them. The
computational efficiency of sparse matrix–vector multiplication (SpMV) is significantly …

A-tucker: Input-adaptive and matricization-free tucker decomposition for dense tensors on CPUs and GPUs

M Li, C Xiao, C Yang - arXiv preprint arXiv:2010.10131, 2020 - arxiv.org
Tucker decomposition is one of the most popular models for analyzing and compressing
large-scale tensorial data. Existing Tucker decomposition algorithms usually rely on a single …