[PDF][PDF] Dask: Parallel computation with blocked algorithms and task scheduling.

M Rocklin - SciPy, 2015 - conference.scipy.org.s3.amazonaws …
Dask enables parallel and out-of-core computation. We couple blocked algorithms with
dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy …

Wukong: A scalable and locality-enhanced framework for serverless parallel computing

B Carver, J Zhang, A Wang, A Anwar, P Wu… - Proceedings of the 11th …, 2020 - dl.acm.org
Executing complex, burst-parallel, directed acyclic graph (DAG) jobs poses a major
challenge for serverless execution frameworks, which will need to rapidly scale and …

Algebraic methods for interactive proof systems

C Lund, L Fortnow, H Karloff, N Nisan - Journal of the ACM (JACM), 1992 - dl.acm.org
A new algebraic technique for the construction of interactive proof systems is presented. Our
technique is used to prove that every language in the polynomial-time hierarchy has an …

Serverless linear algebra

V Shankar, K Krauth, K Vodrahalli, Q Pu… - Proceedings of the 11th …, 2020 - dl.acm.org
Datacenter disaggregation provides numerous benefits to both the datacenter operator and
the application designer. However switching from the server-centric model to a …

Xkaapi: A runtime system for data-flow task programming on heterogeneous architectures

T Gautier, JVF Lima, N Maillard… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Most recent HPC platforms have heterogeneous nodes composed of multi-core CPUs and
accelerators, like GPUs. Programming such nodes is typically based on a combination of …

An efficient multicore implementation of a novel HSS-structured multifrontal solver using randomized sampling

P Ghysels, XS Li, FH Rouet, S Williams… - SIAM Journal on Scientific …, 2016 - SIAM
We present a sparse linear system solver that is based on a multifrontal variant of Gaussian
elimination and exploits low-rank approximation of the resulting dense frontal matrices. We …

Swift/t: Large-scale application composition via distributed-memory dataflow processing

JM Wozniak, TG Armstrong, M Wilde… - 2013 13th IEEE/ACM …, 2013 - ieeexplore.ieee.org
Many scientific applications are conceptually built up from independent component tasks as
a parameter study, optimization, or other search. Large batches of these tasks may be …

The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale

J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek… - SIAM review, 2018 - SIAM
The computation of the singular value decomposition, or SVD, has a long history with many
improvements over the years, both in its implementations and algorithmically. Here, we …

Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA

G Bosilca, A Bouteiller, A Danalis… - … on Parallel and …, 2011 - ieeexplore.ieee.org
We present a method for developing dense linear algebra algorithms that seamlessly scales
to thousands of cores. It can be done with our project called DPLASMA (Distributed …

Achieving high performance on supercomputers with a sequential task-based programming model

E Agullo, O Aumage, M Faverge… - … on Parallel and …, 2017 - ieeexplore.ieee.org
The emergence of accelerators as standard computing resources on supercomputers and
the subsequent architectural complexity increase revived the need for high-level parallel …