Parallel programming with migratable objects: Charm++ in practice

B Acun, A Gupta, N Jain, A Langer… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org
The advent of petascale computing has introduced new challenges (eg Heterogeneity,
system failure) for programming scalable parallel applications. Increased complexity and …

Open problems in queueing theory inspired by datacenter computing

M Harchol-Balter - Queueing Systems, 2021 - Springer
Datacenter operations today provide a plethora of new queueing and scheduling problems.
The notion of a “job” has become more general and multi-dimensional. The ways in which …

Proteus: agile ml elasticity through tiered reliability in dynamic resource markets

A Harlap, A Tumanov, A Chung, GR Ganger… - Proceedings of the …, 2017 - dl.acm.org
Many shared computing clusters allow users to utilize excess idle resources at lower cost or
priority, with the proviso that some or all may be taken away at any time. But, exploiting such …

Towards dynamic resource management with MPI sessions and PMIx

D Huber, M Streubel, I Comprés, M Schulz… - Proceedings of the 29th …, 2022 - dl.acm.org
Job management software on peta-and exascale supercomputers continues to provide static
resource allocations, from a program's start until its end. Dynamic resource allocation and …

A batch system with efficient adaptive scheduling for malleable and evolving applications

S Prabhakaran, M Neumann, S Rinke… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
The throughput of supercomputers depends not only on efficient job scheduling but also on
the type of jobs that form the workload. Malleable jobs are most favourable for a cluster as …

DMRlib: easy-coding and efficient resource management for job malleability

S Iserte, R Mayo, ES Quintana-Ortí… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Process malleability has proved to have a highly positive impact on the resource utilization
and global productivity in data centers compared with the conventional static resource …

Extending slurm for dynamic resource-aware adaptive batch scheduling

M Chadha, J John, M Gerndt - 2020 IEEE 27th International …, 2020 - ieeexplore.ieee.org
With the growing constraints on power budget and increasing hardware failure rates, the
operation of future exascale systems faces several challenges. Towards this, resource …

Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)

G Galante, R da Rosa Righi - Cluster Computing, 2022 - Springer
The evolution of parallel architectures points to dynamic environments where the number of
available resources or configurations may vary during the execution of applications. This …

Drom: Enabling efficient and effortless malleability for resource managers

M D'Amico, M Garcia-Gasulla, V López… - … Proceedings of the …, 2018 - dl.acm.org
In the design of future HPC systems, research in resource management is showing an
increasing interest in a more dynamic control of the available resources. It has been proven …

DMR API: Improving cluster productivity by turning applications into malleable

S Iserte, R Mayo, ES Quintana-Ortí, V Beltran… - Parallel Computing, 2018 - Elsevier
Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of
number of processes. To carry out these job reconfigurations, we have designed a …