Predicting the silent data corruption vulnerability of instructions in programs

N Yang, Y Wang - 2019 IEEE 25th International Conference on …, 2019 - ieeexplore.ieee.org
… A SVM classifier is learned by the signature of residue to detect soft errors [17]. MACORD,
an adaptive, online and machine learning based SDC detection framework is proposed to …

Physics-based checksums for silent-error detection in PDE solvers

M Salloum, JR Mayo, RC Armstrong - Euro-Par 2019: Parallel Processing …, 2020 - Springer
… , we test parallel solvers in a simple resilience framework with emulated silent errors. As in
previous … MACORD: online adaptive machine learning framework for silent error detection. In: …

Ground-truth prediction to accelerate soft-error impact analysis for iterative methods

BO Mutlu, G Kestor, A Cristal, O Unsal… - 2019 IEEE 26th …, 2019 - ieeexplore.ieee.org
errors can lead to application/system crash or even to silent … We present a machine
learning based approach to observe a … detectors as potential predictors: • Adaptive Impact-driven …

[PDF][PDF] Whole program Adaptive Error Detection and Mitigation

G Gopalakrishnan - 2020 - osti.gov
… Many types of software-based soft error detectors have been … [9], and detectors that fit a
machine learning model around the “… (DMR) can rigorously detect silent data corruptions by …

Workshop report on basic research needs for scientific machine learning: Core technologies for artificial intelligence

N Baker, F Alexander, T Bremer, A Hagberg… - 2019 - osti.gov
… heart of DOE application codes to be more adaptive, usually through the use of simple …
MACORD) [210] presents a SciMLbased framework for detecting silent errors in HPC machines

Machine Learning Applied to Soft Error Assessment in Multicore Systems

F Rocha da Rosa, L Ost, R Reis… - Soft Error Reliability …, 2020 - Springer
… This paper shows an average error detection of 90% considering … an online adaptive SDC
detection algorithm using machine … exploration framework using machine learning supervised …

Towards end-to-end sdc detection for hpc applications equipped with lossy compression

S Li, S Di, K Zhao, X Liang, Z Chen… - … Conference on Cluster …, 2020 - ieeexplore.ieee.org
… [27] proposed a machine learning based detection method in terms of the AID … Cappello,
Macord: Online adaptive machine learning framework for silent error detection,” in 2017 IEEE …

Efficient detection of silent data corruption in HPC applications with synchronization-free message verification

G Zhang, Y Liu, H Yang, D Qian - The Journal of Supercomputing, 2022 - Springer
… Subasi O, Di S, Balaprakash P et al (2017) MACORD: online adaptive machine learning
framework for silent error detection. In: Proceedings of 2017 IEEE international conference on …

Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
… A good example would be Silent Data Corruption (SDC) errorsdetection methods have
been proposed such as Adaptive … -driven approaches via machine learning techniques can be …

FPDetect Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation

A Das, S Krishnamoorthy, I Briggs… - ACM Transactions on …, 2020 - dl.acm.org
… (DMR) can rigorously detect silent data corruptions by utilizing a … unlike those are generated
through machine learning [45… AID is an adaptive SDC detector wherein a best-fit prediction …