DAGuE: A generic distributed DAG engine for high performance computing

G Bosilca, A Bouteiller, A Danalis, T Herault… - Parallel Computing, 2012 - Elsevier
The frenetic development of the current architectures places a strain on the current state-of-
the-art programming environments. Harnessing the full potential of such architectures is a …

A hybridization methodology for high-performance linear algebra software for GPUs

E Agullo, C Augonnet, J Dongarra, H Ltaief… - GPU Computing Gems …, 2012 - Elsevier
Publisher Summary This chapter presents a hybridization methodology for the development
of high-performance linear algebra software for graphics processing units (GPUs). The …

Enabling in-situ execution of coupled scientific workflow on multi-core platform

F Zhang, C Docan, M Parashar, S Klasky… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Emerging scientific application workflows are composed of heterogeneous coupled
component applications that simulate different aspects of the physical phenomena being …

Are static schedules so bad? a case study on cholesky factorization

E Agullo, O Beaumont… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Our goal is to provide an analysis and comparison of static and dynamic strategies for task
graph scheduling on platforms consisting of heterogeneous and unrelated resources, such …

PLASMA: Parallel linear algebra software for multicore using OpenMP

J Dongarra, M Gates, A Haidar, J Kurzak… - ACM Transactions on …, 2019 - dl.acm.org
The recent version of the Parallel Linear Algebra Software for Multicore Architectures
(PLASMA) library is based on tasks with dependencies from the OpenMP standard. The …

Dynamic task execution on shared and distributed memory architectures

A YarKhan - 2012 - trace.tennessee.edu
Multicore architectures with high core counts have come to dominate the world of high
performance computing, from shared memory machines to the largest distributed memory …

Parallel hierarchical hybrid linear solvers for emerging computing platforms

E Agullo, L Giraud… - Comptes …, 2011 - comptes-rendus.academie-sciences …
La conception des plateformes d'échelle extrême qui devraient être disponibles dans la
décade à venir représenteront la convergence de tendances technologiques et définiront le …

High performance matrix inversion based on LU factorization for multicore architectures

J Dongarra, M Faverge, H Ltaief… - Proceedings of the 2011 …, 2011 - dl.acm.org
The goal of this paper is to present an efficient implementation of an explicit matrix inversion
of general square matrices on multicore computer architecture. The inversion procedure is …

Flexible linear algebra development and scheduling with cholesky factorization

A Haidar, A YarKhan, C Cao… - 2015 IEEE 17th …, 2015 - ieeexplore.ieee.org
Modern high performance computing environments are composed of networks of compute
nodes that often contain a variety of heterogeneous compute resources, such as multicore …

Task-based sparse hybrid linear solver for distributed memory heterogeneous architectures

E Agullo, L Giraud, S Nakov - … Workshops, Grenoble, France, August 24-26 …, 2017 - Springer
Heterogeneity is emerging as one of the most challenging characteristics of today's parallel
environments. However, not many fully-featured advanced numerical, scientific libraries …