[PDF][PDF] Extending the Domain of Transparent Checkpoint-Restart for Large-scale HPC

R Garg - 2019 - repository.library.northeastern.edu
While large-scale HPC systems are critical for expediting progress in many scientific fields,
exascale computing will face severe resilience challenges. Checkpointrestart is an important …