On the parallel i/o optimality of linear algebra kernels: Near-optimal matrix factorizations

T Hoefler, T Bonato, D De Sensi… - … Conference for High …, 2022 - ieeexplore.ieee.org

Numerous microarchitectural optimizations unlocked tremendous processing power for
deep neural networks that in turn fueled the AI revolution. With the exhaustion of such …

被引用次数：14 相关文章所有 25 个版本

[PDF] nsf.gov

Symmetric block-cyclic distribution: Fewer communications leads to faster dense cholesky factorization

O Beaumont, P Duchon… - … Conference for High …, 2022 - ieeexplore.ieee.org

We consider the distributed Cholesky factorization on homogeneous nodes. Inspired by
recent progress on asymptotic lower bounds on the total communication volume required to …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

Sparse Hamming Graph: A Customizable Network-on-Chip Topology

P Iff, M Besta, M Cavalcante, T Fischer… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org

Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs).
Customization of the NoC topology is necessary to reach the diverse design goals of …

被引用次数：5 相关文章所有 31 个版本

[PDF] arxiv.org

I/O-optimal algorithms for symmetric linear algebra kernels

O Beaumont, L Eyraud-Dubois, J Langou… - Proceedings of the 34th …, 2022 - dl.acm.org

In this paper, we consider two fundamental symmetric kernels in linear algebra: the
Cholesky factorization and the symmetric rank-k update (SYRK), with the classical three …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

Deinsum: Practically I/O optimal multi-linear algebra

AN Ziogas, G Kwasniewski, T Ben-Nun… - … Conference for High …, 2022 - ieeexplore.ieee.org

Multilinear algebra kernel performance on modern massively-parallel systems is determined
mainly by data movement. However, deriving data movement-optimal distributed schedules …

被引用次数：5 相关文章所有 22 个版本

[PDF] netlib.org

Using Additive Modifications in LU Factorization Instead of Pivoting

N Lindquist, P Luszczek, J Dongarra - Proceedings of the 37th …, 2023 - dl.acm.org

Direct solvers for dense systems of linear equations commonly use partial pivoting to ensure
numerical stability. However, pivoting can introduce significant performance overheads …

被引用次数：2 相关文章所有 6 个版本

[PDF] nsf.gov

Threshold Pivoting for Dense LU Factorization

N Lindquist, M Gates, P Luszczek… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org

LU factorization is a key approach for solving large, dense systems of linear equations.
Partial row pivoting is commonly used to ensure numerical stability; however, the data …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Demystifying Higher-Order Graph Neural Networks

M Besta, F Scheidl, L Gianinazzi, S Klaiman… - arXiv preprint arXiv …, 2024 - arxiv.org

Higher-order graph neural networks (HOGNNs) are an important class of GNN models that
harness polyadic relations between vertices beyond plain edges. They have been used to …

[PDF] ethz.ch

I/O-optimal cache-oblivious sparse matrix-sparse matrix multiplication

N Gleinig, M Besta, T Hoefler - 2022 IEEE International Parallel …, 2022 - ieeexplore.ieee.org

Data movements between different levels of the memory hierarchy (I/O-transitions, or simply
I/O s) are a critical performance bottleneck in modern computing. Therefore it is a problem of …

被引用次数：5 相关文章所有 19 个版本

[PDF] acm.org

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

H Al Daas, G Ballard, L Grigori, S Kumar… - Proceedings of the 34th …, 2022 - dl.acm.org

Communication lower bounds have long been established for matrix multiplication
algorithms. However, most methods of asymptotic analysis have either ignored constant …

被引用次数：3 相关文章所有 8 个版本

高级搜索

QQ 群