HammingMesh: a network topology for large-scale deep learning

T Hoefler, T Bonato, D De Sensi… - … Conference for High …, 2022 - ieeexplore.ieee.org
Numerous microarchitectural optimizations unlocked tremendous processing power for
deep neural networks that in turn fueled the AI revolution. With the exhaustion of such …

Symmetric block-cyclic distribution: Fewer communications leads to faster dense cholesky factorization

O Beaumont, P Duchon… - … Conference for High …, 2022 - ieeexplore.ieee.org
We consider the distributed Cholesky factorization on homogeneous nodes. Inspired by
recent progress on asymptotic lower bounds on the total communication volume required to …

Sparse Hamming Graph: A Customizable Network-on-Chip Topology

P Iff, M Besta, M Cavalcante, T Fischer… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org
Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs).
Customization of the NoC topology is necessary to reach the diverse design goals of …

I/O-optimal algorithms for symmetric linear algebra kernels

O Beaumont, L Eyraud-Dubois, J Langou… - Proceedings of the 34th …, 2022 - dl.acm.org
In this paper, we consider two fundamental symmetric kernels in linear algebra: the
Cholesky factorization and the symmetric rank-k update (SYRK), with the classical three …

Deinsum: Practically I/O optimal multi-linear algebra

AN Ziogas, G Kwasniewski, T Ben-Nun… - … Conference for High …, 2022 - ieeexplore.ieee.org
Multilinear algebra kernel performance on modern massively-parallel systems is determined
mainly by data movement. However, deriving data movement-optimal distributed schedules …

Using Additive Modifications in LU Factorization Instead of Pivoting

N Lindquist, P Luszczek, J Dongarra - Proceedings of the 37th …, 2023 - dl.acm.org
Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure
numerical stability. However, pivoting can introduce significant performance overheads …

Threshold Pivoting for Dense LU Factorization

N Lindquist, M Gates, P Luszczek… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
LU factorization is a key approach for solving large, dense systems of linear equations.
Partial row pivoting is commonly used to ensure numerical stability; however, the data …

Demystifying Higher-Order Graph Neural Networks

M Besta, F Scheidl, L Gianinazzi, S Klaiman… - arXiv preprint arXiv …, 2024 - arxiv.org
Higher-order graph neural networks (HOGNNs) are an important class of GNN models that
harness polyadic relations between vertices beyond plain edges. They have been used to …

I/O-optimal cache-oblivious sparse matrix-sparse matrix multiplication

N Gleinig, M Besta, T Hoefler - 2022 IEEE International Parallel …, 2022 - ieeexplore.ieee.org
Data movements between different levels of the memory hierarchy (I/O-transitions, or simply
I/O s) are a critical performance bottleneck in modern computing. Therefore it is a problem of …

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

H Al Daas, G Ballard, L Grigori, S Kumar… - Proceedings of the 34th …, 2022 - dl.acm.org
Communication lower bounds have long been established for matrix multiplication
algorithms. However, most methods of asymptotic analysis have either ignored constant …