The emerging high-performance computing Exascale supercomputing system, which is anticipated to be available in 2020, will unravel many scientific mysteries. This extraordinary …
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. While the HPC community has developed various resilience solutions, the solution …
MPI implementations are becoming increasingly complex and highly tunable, and thus scalability limitations can come from numerous sources. The MPI Tools Interface (MPI_T) …
The Argo project is a DOE initiative for designing a modular operating system/runtime for the next generation of supercomputers. A key focus area in this project is power management …
In this document, we develop a structured approach to the management of HPC resilience based on the concept of resilience-based design patterns. A design pattern is a general …
MU Ashraf, FA Eassa, AA Albeshri… - International Journal of …, 2018 - academia.edu
The emerging Exascale supercomputing system expected till 2020 will unravel many scientific mysteries. This extreme computing system will achieve a thousand-fold increase in …
Implementing an in situ workflow involves several challenges related to data placement, task scheduling, efficient communications, scalability, and reliability. Most of the current …
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs …