Reinit: Evaluating the performance of global-restart recovery methods for mpi fault tolerance

G Georgakoudis, L Guo, I Laguna - International Conference on High …, 2020 - Springer
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …

Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance

G Georgakoudis, L Guo, I Laguna - arXiv e-prints, 2021 - ui.adsabs.harvard.edu
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …

Reinit: Evaluating the Performance of Global-Restart Recovery Methods for MPI Fault Tolerance

G Georgakoudis, L Guo, I Laguna - International Conference on High …, 2020 - dl.acm.org
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …

Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance

G Georgakoudis, L Guo, I Laguna - arXiv preprint arXiv:2102.06896, 2021 - arxiv.org
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …

[HTML][HTML] Reinit [... formula...]: Evaluating the Performance of Global-Restart Recovery Methods for MPI Fault Tolerance

G Georgakoudis, L Guo, I Laguna - High Performance Computing - ncbi.nlm.nih.gov
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …

Reinit

G Georgakoudis, L Guo, I Laguna - … International Conference, ISC …, 2020 - europepmc.org
Scaling supercomputers comes with an increase in failure rates due to the increasing
number of hardware components. In standard practice, applications are made resilient …