Distributed-memory task execution and dependence tracking within DAGuE and the DPLASMA project

G Bosilca, A Bouteiller, A Danalis, T Herault… - Parallel Computing, 2012 - Elsevier

The frenetic development of the current architectures places a strain on the current state-of-
the-art programming environments. Harnessing the full potential of such architectures is a …

被引用次数：503 相关文章所有 35 个版本

[PDF] hal.science

A hybridization methodology for high-performance linear algebra software for GPUs

E Agullo, C Augonnet, J Dongarra, H Ltaief… - GPU Computing Gems …, 2012 - Elsevier

Publisher Summary This chapter presents a hybridization methodology for the development
of high-performance linear algebra software for graphics processing units (GPUs). The …

被引用次数：164 相关文章所有 6 个版本

[PDF] lbl.gov

Enabling in-situ execution of coupled scientific workflow on multi-core platform

F Zhang, C Docan, M Parashar, S Klasky… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

Emerging scientific application workflows are composed of heterogeneous coupled
component applications that simulate different aspects of the physical phenomena being …

被引用次数：116 相关文章所有 12 个版本

[PDF] hal.science

Are static schedules so bad? a case study on cholesky factorization

E Agullo, O Beaumont… - 2016 IEEE …, 2016 - ieeexplore.ieee.org

Our goal is to provide an analysis and comparison of static and dynamic strategies for task
graph scheduling on platforms consisting of heterogeneous and unrelated resources, such …

被引用次数：64 相关文章所有 10 个版本

[PDF] acm.org

PLASMA: Parallel linear algebra software for multicore using OpenMP

J Dongarra, M Gates, A Haidar, J Kurzak… - ACM Transactions on …, 2019 - dl.acm.org

The recent version of the Parallel Linear Algebra Software for Multicore Architectures
(PLASMA) library is based on tasks with dependencies from the OpenMP standard. The …

被引用次数：73 相关文章所有 4 个版本

[PDF] tennessee.edu

Dynamic task execution on shared and distributed memory architectures

A YarKhan - 2012 - trace.tennessee.edu

Multicore architectures with high core counts have come to dominate the world of high
performance computing, from shared memory machines to the largest distributed memory …

被引用次数：63 相关文章所有 5 个版本

[PDF] academie-sciences.fr

Parallel hierarchical hybrid linear solvers for emerging computing platforms

E Agullo, L Giraud… - Comptes …, 2011 - comptes-rendus.academie-sciences …

La conception des plateformes d'échelle extrême qui devraient être disponibles dans la
décade à venir représenteront la convergence de tendances technologiques et définiront le …

被引用次数：40 相关文章所有 8 个版本

[PDF] 131.254.254.45

High performance matrix inversion based on LU factorization for multicore architectures

J Dongarra, M Faverge, H Ltaief… - Proceedings of the 2011 …, 2011 - dl.acm.org

The goal of this paper is to present an efficient implementation of an explicit matrix inversion
of general square matrices on multicore computer architecture. The inversion procedure is …

被引用次数：35 相关文章所有 19 个版本

[PDF] psu.edu

Flexible linear algebra development and scheduling with cholesky factorization

A Haidar, A YarKhan, C Cao… - 2015 IEEE 17th …, 2015 - ieeexplore.ieee.org

Modern high performance computing environments are composed of networks of compute
nodes that often contain a variety of heterogeneous compute resources, such as multicore …

被引用次数：12 相关文章所有 9 个版本

[PDF] researchgate.net

Task-based sparse hybrid linear solver for distributed memory heterogeneous architectures

E Agullo, L Giraud, S Nakov - … Workshops, Grenoble, France, August 24-26 …, 2017 - Springer

Heterogeneity is emerging as one of the most challenging characteristics of today's parallel
environments. However, not many fully-featured advanced numerical, scientific libraries …

被引用次数：14 相关文章所有 5 个版本

高级搜索

QQ 群