A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

W Liu, B Vinter - Journal of Parallel and Distributed Computing, 2015 - Elsevier
General sparse matrix–matrix multiplication (SpGEMM) is a fundamental building block for
numerous applications such as algebraic multigrid method (AMG), breadth first search and …

Hypergraph partitioning for sparse matrix-matrix multiplication

G Ballard, A Druinsky, N Knight… - ACM Transactions on …, 2016 - dl.acm.org
We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication
(SpGEMM), a key computational kernel in scientific computing and data analysis whose …

The input/output complexity of triangle enumeration

R Pagh, F Silvestri - Proceedings of the 33rd ACM SIGMOD-SIGACT …, 2014 - dl.acm.org
We consider the well-known problem of enumerating all triangles of an undirected graph.
Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the …

Fast Matrix Multiplication for Query Processing

X Hu - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
This paper studies how to use fast matrix multiplication to speed up query processing. As
observed, computing a two-table join and then projecting away the join attribute is …

Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication

G Ballard, A Druinsky, N Knight… - Proceedings of the 27th …, 2015 - dl.acm.org
The performance of parallel algorithms for sparse matrix-matrix multiplication is typically
determined by the amount of interprocessor communication performed, which in turn …

[PDF][PDF] Parallel and scalable sparse basic linear algebra subprograms

W Liu - 2015 - nbi.ku.dk
Sparse basic linear algebra subprograms (BLAS) are fundamental building blocks for
numerous scientific computations and graph applications. Compared with Dense BLAS …

Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs

Y Wang, F Chang, B Wei, J Gao, W Ji - ACM Transactions on Architecture …, 2024 - dl.acm.org
AMG is one of the most efficient and widely used methods for solving sparse linear systems.
The computational process of AMG mainly consists of a series of iterative calculations of …

The I/O complexity of Strassen's matrix multiplication with recomputation

G Bilardi, L De Stefani - Workshop on Algorithms and Data Structures, 2017 - Springer
Abstract A tight\varOmega ((n/M)^\log _2 7 M) lower bound is derived on the I/O complexity
of Strassen's algorithm to multiply two n * n matrices, in a two-level storage hierarchy with M …

Fine-grained I/O complexity via reductions: New lower bounds, faster algorithms, and a time hierarchy

ED Demaine, A Lincoln, QC Liu, J Lynch… - arXiv preprint arXiv …, 2017 - arxiv.org
This paper initiates the study of I/O algorithms (minimizing cache misses) from the
perspective of fine-grained complexity (conditional polynomial lower bounds). Specifically …

The I/O Complexity of Attention, or How Optimal is Flash Attention?

B Saha, C Ye - arXiv preprint arXiv:2402.07443, 2024 - arxiv.org
Self-attention is at the heart of the popular Transformer architecture, yet suffers from
quadratic time and memory complexity. The breakthrough FlashAttention algorithm revealed …