Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current …
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches …
Currently, a major bottleneck for several scientific computations is communication, both communication between different processors, so-called horizontal communication, and …
Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational …
Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules …
A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower …
Dense linear algebra kernels are fundamental components of many scientific computing applications. In this work we present a novel method of deriving parallel I/O lower bounds for …
Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer …
PA Papp, R Wattenhofer - Proceedings of the 32nd ACM Symposium on …, 2020 - dl.acm.org
Red-blue pebble games model the computation cost of a two-level memory hierarchy. We present various hardness results in different red-blue pebbling variants, with a focus on the …