Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics

K Assogba, B Nicolae, H Van Dam… - … of the SC'23 Workshops of …, 2023 - dl.acm.org
High-performance computing applications are increasingly integrating checkpointing
libraries for reproducibility analytics. However, capturing an entire checkpoint history for …

AdapCK: Optimizing I/O for Checkpointing on Large-Scale High Performance Computing Systems

J Jia, Y Liu, Y Liu, Y Chen, F Lin - European Conference on Parallel …, 2024 - Springer
With the scaling-up of high-performance computing (HPC) systems, the resilience has
become an important challenge. As a widely used resilience technique for HPC systems …