Efficient detection of silent data corruption in HPC applications with synchronization-free message verification

G Zhang, Y Liu, H Yang, D Qian - The Journal of Supercomputing, 2022 - Springer
… Berrocal E, Bautista-Gomez L, Di S et al (2015) Lightweight silent data corruption
detection based on runtime data analysis for HPC applications. In: Proceedings of the 24th …

Mitigating silent data corruptions in HPC applications across multiple program inputs

Y Huang, S Guo, S Di, G Li… - … Performance Computing …, 2022 - ieeexplore.ieee.org
… that the fault leads to an silent data corruption (SDC) which is our … will detect the mismatch
at runtime, and hence detect SDC. … : A lightweight continuous framework for hpc applications

Peppa-x: finding program test inputs to bound silent data corruption vulnerability in hpc applications

MH Rahman, A Shamji, S Guo, G Li - … for High Performance Computing …, 2021 - dl.acm.org
… to measure them at runtime. We track the executed times of each static instruction at runtime,
then … Hauberk: Lightweight silent data corruption error detector for GPGPU. In International …

Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to vari… a library widely used in HPC
applications. The wide impacts of … control the CPU temperature at run time to mitigate this type of …

Understanding Silent Data Corruption in Processors for Mitigating its Effects

S Wang, G Zhang, J Wei, Y Wang, J Wu… - ACM Transactions on …, 2024 - dl.acm.org
Lightweight silent data corruption detection based on runtime data analysis for HPC
applications. In Proceedings of the 24th International Symposium on High-Performance …

On the Detection of Silent Data Corruptions in HPC Applications Using Redundant Multi-threading

D Pérez, T Ropars, E Meneses - European Conference on Parallel …, 2020 - Springer
… ) to detect Silent Data Corruptions in HPC applications. To understand if it can be a viable
solution in an HPC … a data corruption visible to the outside world. As such, we propose to only …

Association rule mining based algorithm for recovery of silent data corruption in convolutional neural network data storage

M Ramzanpour, SA Ludwig - 2020 IEEE Symposium Series on …, 2020 - ieeexplore.ieee.org
… ) applications to check for deviations, which could be a potential soft error. Their proposed …
for iterative based HPC applications consists of two steps. By time series analysis of each data

Estimating silent data corruption rates using a two-level model

SKS Hari, P Rech, T Tsai, M Stephenson… - arXiv preprint arXiv …, 2020 - arxiv.org
high-performance computing systems and safety-critical embedded systems. These transient
faults can propagate to the application … If a program takes more than 3x its expected runtime

A visual comparison of silent error propagation

Z Li, H Menon, K Mohror, S Liu, L Guo… - … on Visualization and …, 2022 - ieeexplore.ieee.org
… Analyzing the resiliency of HPC applications in extreme-scale computing to silent data … ,
Lightweight silent data corruption detection based on runtime data analysis for hpc applications,…

Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVs

YS Hsiao, Z Wan, T Jia, R Ghosal… - … on Computer-Aided …, 2023 - ieeexplore.ieee.org
lightweight mitigation technique. To this end, we propose two software-directed and lightweight
… to compute units and schedules tasks at runtime. Each ROS node is treated as a process …