[引用][C] Transparent load balancing of MPI programs using OmpSs-2@ Cluster and DLB

JA Mena, O Shaaban, V Lopez, M Garcia… - 51st International …, 2022 - paul-carpenter.org
Abstract Load imbalance is a long-standing source of inefficiency in high performance
computing. The situation has only got worse as applications and systems increase in …

Itoyori: Reconciling Global Address Space and Global Fork-Join Task Parallelism

S Shiina, K Taura - Proceedings of the International Conference for High …, 2023 - dl.acm.org
This paper introduces Itoyori, a task-parallel runtime system designed to tackle the
challenge of scaling task parallelism (more specifically, nested fork-join parallelism) beyond …

Scalable tasking runtime with parallelized builders for explicit message passing architectures

X Gao, L Chen, H Wang, H Cui, X Feng - Parallel Computing, 2025 - Elsevier
The sequential task flow (STF) model introduces implicit data dependences to exploit task-
based parallelism, simplifying programming but also introducing non-negligible runtime …

Toward a Dynamic Allocation Strategy for Deadline‐Oriented Resource and Job Management in HPC Systems

B Linnert, CAF De Rose… - … and Computation: Practice …, 2025 - Wiley Online Library
As high‐performance computing (HPC) becomes a tool used in many different workflows,
quality of service (QoS) becomes increasingly important. In many cases, this includes the …

Automatic aggregation of subtask accesses for nested OpenMP-style tasks

O Shaaban, J Aguilar, V Beltran… - 2022 IEEE 34th …, 2022 - ieeexplore.ieee.org
Task-based programming is a high performance and productive model to express
parallelism. Tasks encapsulate work to be executed across multiple cores or offloaded to …

Towards achieving transparent malleability thanks to mpi process virtualization

H Taboada, R Pereira, J Jaeger, JB Besnard - International Conference on …, 2023 - Springer
Abstract The field of High-Performance Computing is rapidly evolving, driven by the race for
computing power and the emergence of new architectures. Despite these changes, the …

On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data

D Fox, JM Diaz, X Li - arXiv preprint arXiv:2302.00115, 2023 - arxiv.org
For decades, memory capabilities have scaled up much slower than compute capabilities,
leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate …

On the use of hierarchical task for heterogeneous architectures

G Lucas - 2023 - theses.hal.science
In the last decades, the computing power of high-performance platforms has grown
exponentially at the expense of increased complexity. Programming such platforms to take …

Transparent load balancing of MPI programs using OmpSs-2@ Cluster and DLB

J Aguilar Mena, O Shaaban, V Lopez… - Proceedings of the 51st …, 2022 - dl.acm.org
Load imbalance is a long-standing source of inefficiency in high performance computing.
The situation has only got worse as applications and systems increase in complexity, eg …

[PDF][PDF] The DEEP-SEA project: a software stack for heterogeneous and modular supercomputers

E Suarez, N Eicker, HC Hoppe - PARS-Mitteilungen, 2024 - dl.gi.de
Today's most powerful supercomputers achieve their performance through heterogeneous
system architectures that integrate CPUs with accelerators, especially GPUs, and advanced …