Data access complexity: The red/blue pebble game revisited

G Kwasniewski, M Kabić, M Besta… - Proceedings of the …, 2019 - dl.acm.org

We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near
communication-optimal for all combinations of matrix dimensions, processor counts, and …

被引用次数：114 相关文章所有 25 个版本

[PDF] arxiv.org

On the parallel i/o optimality of linear algebra kernels: Near-optimal matrix factorizations

G Kwasniewski, M Kabic, T Ben-Nun… - Proceedings of the …, 2021 - dl.acm.org

Matrix factorizations are among the most important building blocks of scientific computing.
However, state-of-the-art libraries are not communication-optimal, underutilizing current …

被引用次数：22 相关文章所有 22 个版本

[PDF] arxiv.org

Pebbles, graphs, and a pinch of combinatorics: Towards tight I/O lower bounds for statically analyzable programs

G Kwasniewski, T Ben-Nun, L Gianinazzi… - Proceedings of the 33rd …, 2021 - dl.acm.org

Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel
algorithms, both across the memory hierarchy and between processors. Current approaches …

被引用次数：18 相关文章所有 22 个版本

[PDF] acm.org

Reducing communication in the conjugate gradient method: a case study on high-order finite elements

M Karp, N Jansson, A Podobas, P Schlatter… - Proceedings of the …, 2022 - dl.acm.org

Currently, a major bottleneck for several scientific computations is communication, both
communication between different processors, so-called horizontal communication, and …

被引用次数：6 相关文章所有 2 个版本

[PDF] acm.org

On characterizing the data access complexity of programs

V Elango, F Rastello, LN Pouchet… - Proceedings of the …, 2015 - dl.acm.org

Technology trends will cause data movement to account for the majority of energy
expenditure and execution time on emerging computers. Therefore, computational …

被引用次数：36 相关文章所有 12 个版本

[PDF] arxiv.org

Deinsum: Practically I/O optimal multi-linear algebra

AN Ziogas, G Kwasniewski, T Ben-Nun… - … Conference for High …, 2022 - ieeexplore.ieee.org

Multilinear algebra kernel performance on modern massively-parallel systems is determined
mainly by data movement. However, deriving data movement-optimal distributed schedules …

被引用次数：6 相关文章所有 22 个版本

[PDF] arxiv.org

A tight i/o lower bound for matrix multiplication

TM Smith, B Lowery, J Langou… - arXiv preprint arXiv …, 2017 - arxiv.org

A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication
on a processor with two layers of memory is established. Prior work obtained weaker lower …

被引用次数：20 相关文章所有 3 个版本

[PDF] researchgate.net

On the parallel I/O optimality of linear algebra kernels: Near-optimal LU factorization

G Kwasniewski, T Ben-Nun, AN Ziogas… - Proceedings of the 26th …, 2021 - dl.acm.org

Dense linear algebra kernels are fundamental components of many scientific computing
applications. In this work we present a novel method of deriving parallel I/O lower bounds for …

被引用次数：12 相关文章所有 20 个版本

[PDF] arxiv.org

On characterizing the data movement complexity of computational DAGs for parallel execution

V Elango, F Rastello, LN Pouchet… - Proceedings of the 26th …, 2014 - dl.acm.org

Technology trends are making the cost of data movement increasingly dominant, both in
terms of energy and time, over the cost of performing arithmetic operations in computer …

被引用次数：20 相关文章所有 17 个版本

[PDF] arxiv.org

On the hardness of red-blue pebble games

PA Papp, R Wattenhofer - Proceedings of the 32nd ACM Symposium on …, 2020 - dl.acm.org

Red-blue pebble games model the computation cost of a two-level memory hierarchy. We
present various hardness results in different red-blue pebbling variants, with a focus on the …

被引用次数：9 相关文章所有 8 个版本

高级搜索

QQ 群