Pebbles, graphs, and a pinch of combinatorics: Towards tight I/O lower bounds for statically analyzable programs

G Kwasniewski, T Ben-Nun, L Gianinazzi… - Proceedings of the 33rd …, 2021 - dl.acm.org
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel
algorithms, both across the memory hierarchy and between processors. Current approaches …

Reducing communication in the conjugate gradient method: a case study on high-order finite elements

M Karp, N Jansson, A Podobas, P Schlatter… - Proceedings of the …, 2022 - dl.acm.org
Currently, a major bottleneck for several scientific computations is communication, both
communication between different processors, so-called horizontal communication, and …

Automated derivation of parametric data movement lower bounds for affine programs

A Olivry, J Langou, LN Pouchet… - Proceedings of the 41st …, 2020 - dl.acm.org
Researchers and practitioners have for long worked on improving the computational
complexity of algorithms, focusing on reducing the number of operations needed to perform …

Brief Announcement: Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs

T Böhnlein, PA Papp, AJN Yzelman - … of the 36th ACM Symposium on …, 2024 - dl.acm.org
The well-studied red-blue pebble game models the execution of an arbitrary computational
DAG by a single processor over a two-level memory hierarchy. We present a natural …

On characterizing the data access complexity of programs

V Elango, F Rastello, LN Pouchet… - Proceedings of the …, 2015 - dl.acm.org
Technology trends will cause data movement to account for the majority of energy
expenditure and execution time on emerging computers. Therefore, computational …

A high-fidelity flow solver for unstructured meshes on field-programmable gate arrays: Design, evaluation, and future challenges

M Karp, A Podobas, T Kenter, N Jansson… - … Conference on High …, 2022 - dl.acm.org
The impending termination of Moore's law motivates the search for new forms of computing
to continue the performance scaling we have grown accustomed to. Among the many …

I/O lower bounds for auto-tuning of convolutions in CNNs

X Zhang, J Xiao, G Tan - Proceedings of the 26th ACM SIGPLAN …, 2021 - dl.acm.org
Convolution is the most time-consuming part in the computation of convolutional neural
networks (CNNs), which have achieved great successes in numerous practical applications …

The red-blue pebble game on trees and dags with large input

N Gleinig, T Hoefler - International Colloquium on Structural Information …, 2022 - Springer
Data movements between different levels of a memory hierarchy (I/Os) are a principal
performance bottleneck. This is particularly noticeable in computations that have low …

Tessellating star stencils

L Yuan, S Huang, Y Zhang, H Cao - Proceedings of the 48th International …, 2019 - dl.acm.org
Stencil computations represent a very common class of nested loops in scientific and
engineering applications. The exhaustively studied tiling is one of the most powerful …

On the hardness of red-blue pebble games

PA Papp, R Wattenhofer - Proceedings of the 32nd ACM Symposium on …, 2020 - dl.acm.org
Red-blue pebble games model the computation cost of a two-level memory hierarchy. We
present various hardness results in different red-blue pebbling variants, with a focus on the …