Towards dynamic resource management with MPI sessions and PMIx

D Huber, M Streubel, I Comprés, M Schulz… - Proceedings of the 29th …, 2022 - dl.acm.org
Job management software on peta-and exascale supercomputers continues to provide static
resource allocations, from a program's start until its end. Dynamic resource allocation and …

[HTML][HTML] A survey on malleability solutions for high-performance distributed computing

JI Aliaga, M Castillo, S Iserte, I Martín-Álvarez… - Applied Sciences, 2022 - mdpi.com
Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-
Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale …

DMRlib: easy-coding and efficient resource management for job malleability

S Iserte, R Mayo, ES Quintana-Ortí… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Process malleability has proved to have a highly positive impact on the resource utilization
and global productivity in data centers compared with the conventional static resource …

Dynamic spawning of MPI processes applied to malleability

I Martín-Álvarez, JI Aliaga, M Castillo… - … Journal of High …, 2024 - journals.sagepub.com
Malleability allows computing facilities to adapt their workloads through resource
management systems to maximize the throughput of the facility and the efficiency of the …

Extending slurm for dynamic resource-aware adaptive batch scheduling

M Chadha, J John, M Gerndt - 2020 IEEE 27th International …, 2020 - ieeexplore.ieee.org
With the growing constraints on power budget and increasing hardware failure rates, the
operation of future exascale systems faces several challenges. Towards this, resource …

Adaptive parallel applications: from shared memory architectures to fog computing (2002–2022)

G Galante, R da Rosa Righi - Cluster Computing, 2022 - Springer
The evolution of parallel architectures points to dynamic environments where the number of
available resources or configurations may vary during the execution of applications. This …

[PDF][PDF] rfaas: Rdma-enabled faas platform for serverless high-performance computing

M Copik, K Taranov, A Calotoiu… - arXiv preprint arXiv …, 2021 - ww.unixer.de
The rigid MPI programming model and batch scheduling dominate high-performance
computing. While clouds brought new levels of elasticity into the world of computing …

Extending parallel programming patterns with adaptability features

G Galante, R da Rosa Righi, C de Andrade - Cluster Computing, 2024 - Springer
Today, all computers have some degree of usable parallelism. Modern computers are
explicitly equipped with hardware support for parallelism, such as multiple nodes …

Drom: Enabling efficient and effortless malleability for resource managers

M D'Amico, M Garcia-Gasulla, V López… - … Proceedings of the …, 2018 - dl.acm.org
In the design of future HPC systems, research in resource management is showing an
increasing interest in a more dynamic control of the available resources. It has been proven …

Transparent resource elasticity for task-based cluster environments with work stealing

J Posner, C Fohry - 50th International Conference on Parallel …, 2021 - dl.acm.org
Resource elasticity allows to dynamically change the resources of running jobs, which may
significantly improve the throughput on supercomputers. Elasticity requires support from …