E Yao, R Wang,
M Chen, G Tan… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Fault tolerance overhead of high performance computing (HPC) applications is becoming
critical to the efficient utilization of HPC systems at large scale. Today's HPC applications …