Using machine learning techniques to evaluate multicore soft error reliability

FR da Rosa, R Garibotti, L Ost… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Virtual platform frameworks have been extended to allow earlier soft error analysis of more
realistic multicore systems (ie, real software stacks and state-of-the-art ISAs). The high …

[HTML][HTML] SOFIA: An automated framework for early soft error assessment, identification, and mitigation

J Gava, V Bandeira, F Rosa, R Garibotti, R Reis… - Journal of Systems …, 2022 - Elsevier
The occurrence of radiation-induced soft errors in electronic computing systems can either
affect non-essential system functionalities or violate safety–critical conditions, which might …

Evaluating and accelerating high-fidelity error injection for hpc

CK Chang, S Lym, N Kelly… - … conference for high …, 2018 - ieeexplore.ieee.org
We address two important concerns for the analysis of the behavior of applications in the
presence of hardware errors:(1) when is it important to model how hardware faults lead to …

Hamartia: A fast and accurate error injection framework

CK Chang, S Lym, N Kelly… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
Single bit-flip has been the most popular error model for resilience studies with fault
injection. We use RTL gate-level fault injection to show that this model fails to cover many …

Non-intrusive fault injection techniques for efficient soft error vulnerability analysis

V Bandeira, F Rosa, R Reis… - 2019 IFIP/IEEE 27th …, 2019 - ieeexplore.ieee.org
Electronic computing systems are integrating modern multicore processors and GPUs
aiming to perform complex software stacks in different life-critical systems, including health …

A Survey of QEMU-Based Fault Injection Tools & Techniques for Emulating Physical Faults

YB Bekele, DB Limbrick, JC Kelly - IEEE Access, 2023 - ieeexplore.ieee.org
Fault Injection (FI) is a method used to quantify the reliability and resilience of a system by
assessing the system's ability to detect, locate, and/or mitigate fault occurrences. At the …

Evaluating compiler ir-level selective instruction duplication with realistic hardware errors

CK Chang, G Li, M Erez - 2019 IEEE/ACM 9th Workshop on …, 2019 - ieeexplore.ieee.org
Hardware faults (ie, soft errors) are projected to increase in modern HPC systems. The faults
often lead to error propagation in programs and result in silent data corruptions (SDCs) …

Extensive evaluation of programming models and ISAs impact on multicore soft error reliability

F Rosa, V Bandeira, R Reis, L Ost - Proceedings of the 55th Annual …, 2018 - dl.acm.org
To take advantage of the performance enhancements provided by multicore processors,
new instruction set architectures (ISAs) and parallel programming libraries have been …

Chaser: An enhanced fault injection tool for tracing soft errors in mpi applications

Q Guan, X Hu, T Grove, B Fang, H Jiang… - 2020 50th Annual …, 2020 - ieeexplore.ieee.org
Resilient computation has been an emerging topic in the field of high-performance
computing (HPC). In particular, studies show that tolerating faults on leadership-class …

VISILIENCE: An Interactive Visualization Framework for Resilience Analysis using Control-Flow Graph

H Jiang, S Ruan, B Fang, Y Wang… - 2023 IEEE 28th Pacific …, 2023 - ieeexplore.ieee.org
Soft errors have become one of the main concerns for the resilience of HPC applications, as
these errors can cause HPC applications to generate serious outcomes such as silent data …