SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org

The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

被引用次数：450 相关文章所有 7 个版本

[PDF] utk.edu

DAGuE: A generic distributed DAG engine for high performance computing

G Bosilca, A Bouteiller, A Danalis, T Herault… - Parallel Computing, 2012 - Elsevier

The frenetic development of the current architectures places a strain on the current state-of-
the-art programming environments. Harnessing the full potential of such architectures is a …

被引用次数：502 相关文章所有 35 个版本

[PDF] arxiv.org

A class of parallel tiled linear algebra algorithms for multicore architectures

A Buttari, J Langou, J Kurzak, J Dongarra - Parallel computing, 2009 - Elsevier

As multicore systems continue to gain ground in the high performance computing world,
linear algebra algorithms have to be reformulated or new algorithms have to be developed …

被引用次数：722 相关文章所有 24 个版本

[PDF] researchgate.net

Elemental: A new framework for distributed memory dense matrix computations

J Poulson, B Marker, RA Van de Geijn… - ACM Transactions on …, 2013 - dl.acm.org

Parallelizing dense matrix computations to distributed memory architectures is a well-
studied subject and generally considered to be among the best understood domains of …

被引用次数：328 相关文章所有 13 个版本

[PDF] academia.edu

Hierarchical task-based programming with StarSs

J Planas, RM Badia, E Ayguadé… - … International Journal of …, 2009 - journals.sagepub.com

Programming models for multicore and many-core systems are listed as one of the main
challenges in the near future for computing research. These programming models should be …

被引用次数：294 相关文章所有 5 个版本

[PDF] arxiv.org

Parallel tiled QR factorization for multicore architectures

A Buttari, J Langou, J Kurzak… - … Practice and Experience, 2008 - Wiley Online Library

As multicore systems continue to gain ground in the high‐performance computing world,
linear algebra algorithms have to be reformulated or new algorithms have to be developed …

被引用次数：280 相关文章所有 31 个版本

[PDF] psu.edu

Programming matrix algorithms-by-blocks for thread-level parallelism

G Quintana-Ortí, ES Quintana-Ortí… - ACM Transactions on …, 2009 - dl.acm.org

With the emergence of thread-level parallelism as the primary means for continued
performance improvement, the programmability issue has reemerged as an obstacle to the …

被引用次数：195 相关文章所有 14 个版本

[PDF] cmu.edu

[PDF][PDF] Provably good multicore cache performance for divide-and-conquer algorithms

GE Blelloch, RA Chowdhury, PB Gibbons… - Proceedings of the …, 2008 - cs.cmu.edu

This paper presents a multicore-cache model that reflects the reality that multicore
processors have both per-processor private (L1) caches and a large shared (L2) cache on …

被引用次数：162 相关文章所有 14 个版本

[PDF] kaust.edu.sa

Extreme-scale task-based cholesky factorization toward climate and weather prediction applications

Q Cao, Y Pei, K Akbudak, A Mikhalev… - Proceedings of the …, 2020 - dl.acm.org

Climate and weather can be predicted statistically via geospatial Maximum Likelihood
Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE …

被引用次数：52 相关文章所有 9 个版本

[PDF] psu.edu

Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

E Chan, FG Van Zee, P Bientinesi… - Proceedings of the 13th …, 2008 - dl.acm.org

This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for
SMP and/or multi-core architectures. We use this system to demonstrate how code …

被引用次数：147 相关文章所有 8 个版本

高级搜索

QQ 群