The occurrence of radiation-induced soft errors in electronic computing systems can either affect non-essential system functionalities or violate safety–critical conditions, which might …
CK Chang, S Lym, N Kelly… - … conference for high …, 2018 - ieeexplore.ieee.org
We address two important concerns for the analysis of the behavior of applications in the presence of hardware errors:(1) when is it important to model how hardware faults lead to …
CK Chang, S Lym, N Kelly… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
Single bit-flip has been the most popular error model for resilience studies with fault injection. We use RTL gate-level fault injection to show that this model fails to cover many …
Electronic computing systems are integrating modern multicore processors and GPUs aiming to perform complex software stacks in different life-critical systems, including health …
Fault Injection (FI) is a method used to quantify the reliability and resilience of a system by assessing the system's ability to detect, locate, and/or mitigate fault occurrences. At the …
CK Chang, G Li, M Erez - 2019 IEEE/ACM 9th Workshop on …, 2019 - ieeexplore.ieee.org
Hardware faults (ie, soft errors) are projected to increase in modern HPC systems. The faults often lead to error propagation in programs and result in silent data corruptions (SDCs) …
To take advantage of the performance enhancements provided by multicore processors, new instruction set architectures (ISAs) and parallel programming libraries have been …
Resilient computation has been an emerging topic in the field of high-performance computing (HPC). In particular, studies show that tolerating faults on leadership-class …
H Jiang, S Ruan, B Fang, Y Wang… - 2023 IEEE 28th Pacific …, 2023 - ieeexplore.ieee.org
Soft errors have become one of the main concerns for the resilience of HPC applications, as these errors can cause HPC applications to generate serious outcomes such as silent data …