CHA Costa, Y Park, BS Rosenburg, CY Cher… - Proceedings of the …, 2014 - dl.acm.org
Today's HPC systems use two mechanisms to address main-memory errors. Error-correcting
codes make correctable errors transparent to software, while checkpoint/restart (CR) …