The sparse polyhedral framework: Composing compiler-generated inspector-executor code

MM Strout, M Hall, C Olschanowsky - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …

Sparsep: Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, JG Luna, N Koziris… - Proceedings of the …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Towards efficient sparse matrix vector multiplication on real processing-in-memory architectures

C Giannoula, I Fernandez, J Gómez-Luna… - ACM SIGMETRICS …, 2022 - dl.acm.org
Several manufacturers have already started to commercialize near-bank Processing-In-
Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures …

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations

K Kanellopoulos, N Vijaykumar, C Giannoula… - Proceedings of the …, 2019 - dl.acm.org
Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …

Evaluation criteria for sparse matrix storage formats

D Langr, P Tvrdik - IEEE Transactions on parallel and …, 2015 - ieeexplore.ieee.org
When authors present new storage formats for sparse matrices, they usually focus mainly on
a single evaluation criterion, which is the performance of sparse matrix-vector multiplication …

Performance optimization using partitioned SpMV on GPUs and multicore CPUs

W Yang, K Li, Z Mo, K Li - IEEE Transactions on Computers, 2014 - ieeexplore.ieee.org
This paper presents a sparse matrix partitioning strategy to improve the performance of
SpMV on GPUs and multicore CPUs. This method has wide adaptability for different types of …

Accelerating framework of transformer by hardware design and model compression co-optimization

P Qi, EHM Sha, Q Zhuge, H Peng… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
State-of-the-art Transformer-based models, with gigantic parameters, are difficult to be
accommodated on resource constrained embedded devices. Moreover, with the …

Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication

A Buluç, S Williams, L Oliker… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point
performance (byte: flop ratio) is decreasing as core counts increase, further limiting the …

Loop and data transformations for sparse matrix code

A Venkat, M Hall, M Strout - ACM SIGPLAN Notices, 2015 - dl.acm.org
This paper introduces three new compiler transformations for representing and transforming
sparse matrix computations and their data representations. In cooperation with run-time …

CSX: an extended compression format for spmv on shared memory systems

K Kourtis, V Karakasis, G Goumas, N Koziris - ACM SIGPLAN Notices, 2011 - dl.acm.org
The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory
systems with multiple processing units due to the streaming nature of its data access pattern …