Predictive reliability and fault management in exascale systems: State of the art and perspectives

R Canal, C Hernandez, R Tornero, A Cilardo… - ACM Computing …, 2020 - dl.acm.org
… Finally, we identify the promising paths to meet the reliability levels of … Spatial support
vector regression to detect silent errors in the exascale era. In Proceedings of the 16th IEEE/ACM …

The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
… that exascale systems will experience faults and errors more fre… However, silent data corruption
(SDC) might require more … a global memory address space partitioned across the nodes. …

Sentiment analysis based error detection for large-scale systems

KA Alharthi, A Jhumka, S Di… - 2021 51st Annual …, 2021 - ieeexplore.ieee.org
… are designed/utilized towards exascale computing, inevitably … on one-day period (27-March-2017)
due to space limit and … using partial labels based on PU learning and Support Vector

Response of HPC hardware to neutron radiation at the dawn of exascale

A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
silent data corruption detectors by leveraging support vector … The right replication level to
detect and correct silent errors at … their spatial locality and providing the mean relative error (…

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101)

L Giraud, U Rüde, L Stals - 2020 - drops.dagstuhl.de
… scientists with expertise in exascale computing to discuss novel … for detection, containment
and mitigation of silent data … -resolution in space or time and the error estimators themselves …

Calculation of the high-energy neutron flux for anticipating errors and recovery techniques in exascale supercomputer centres

H Asorey, R Mayo-Garcia - The Journal of Supercomputing, 2023 - Springer
… , silent data corruption (SDC) errors, or simply silent errors (… the detection interval required
to detect the error in the error … with their spatial locality and provide the mean relative error (…

Doubt and redundancy kill soft errors—Towards detection and correction of silent data corruption in task-based numerical software

P Samfass, T Weinzierl, A Reinarz… - 2021 IEEE/ACM 11th …, 2021 - ieeexplore.ieee.org
… , we have to assume that exascale machines will fail frequently … from the other over a longer
period, it is reasonable to … follow-up space-time predictors to which our error detection and …

Supercomputing, Exascale Computing, High Performance Computing

A Berea - Encyclopedia of Big Data, 2022 - Springer
… ing pixels in order to pick out spatially close parts of features. … investment and involves
human error. Automated methods … used supervised classifiers include support vector machines, …

[图书][B] Designing Efficient and Resilient Lossy Compressors for Large-Scale Scientific Computing

S Li - 2020 - search.proquest.com
… latter one which is soft error or silent data corruption (SDC). … characteristics in both time and
space dimension. For the soft … , aims to develop MD simulation at exascale to address the key …

A visual comparison of silent error propagation

Z Li, H Menon, K Mohror, S Liu, L Guo… - … on Visualization and …, 2022 - ieeexplore.ieee.org
… the complicated spatial and temporal correlation between error … that affects computation for
only a short period of time, … roll back for recovering once an error is detected). However, most …