Silent data corruptions at scale

HD Dixit, S Pendharkar, M Beadon, C Mason… - arXiv preprint arXiv …, 2021 - arxiv.org
Silent Data Corruption (SDC) can have negative impact on large-scale infrastructure
services. SDCs are not captured by error reporting mechanisms within a Central Processing …

Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …

Low-cost program-level detectors for reducing silent data corruptions

SKS Hari, SV Adve, H Naeimi - IEEE/IFIP international …, 2012 - ieeexplore.ieee.org
With technology scaling, transient faults are becoming an increasing threat to hardware
reliability. Commodity systems must be made resilient to these in-field faults through very …

Lightweight silent data corruption detection based on runtime data analysis for HPC applications

E Berrocal, L Bautista-Gomez, S Di, Z Lan… - Proceedings of the 24th …, 2015 - dl.acm.org
Next-generation supercomputers are expected to have more components and, at the same
time, consume several times less energy per operation. Consequently, the number of soft …

Modeling soft-error propagation in programs

G Li, K Pattabiraman, SKS Hari… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
As technology scales to lower feature sizes, devices become more susceptible to soft errors.
Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability …

nZDC: A compiler technique for near zero silent data corruption

M Didehban, A Shrivastava - Proceedings of the 53rd Annual Design …, 2016 - dl.acm.org
Exponentially growing rate of soft errors makes reliability a major concern in modern
processor design. Since software-oriented approaches offer flexible protection even in off …

Ipas: Intelligent protection against silent output corruption in scientific applications

I Laguna, M Schulz, DF Richards, J Calhoun… - Proceedings of the 2016 …, 2016 - dl.acm.org
This paper presents IPAS, an instruction duplication technique that protects scientific
applications from silent data corruption (SDC) in their output. The motivation for IPAS is that …

Error detector placement for soft computation

A Thomas, K Pattabiraman - 2013 43rd Annual IEEE/IFIP …, 2013 - ieeexplore.ieee.org
The scaling of Silicon devices has exacerbated the unreliability of modern computer
systems, and power constraints have necessitated the involvement of software in hardware …

Demystifying soft error assessment strategies on arm cpus: Microarchitectural fault injection vs. neutron beam experiments

A Chatzidimitriou, P Bodmann… - 2019 49th Annual …, 2019 - ieeexplore.ieee.org
Fault injection in early microarchitecture-level simulation CPU models and beam
experiments on the final physical CPU chip are two established methodologies to access the …

Automatically diagnosing and repairing error handling bugs in C

Y Tian, B Ray - Proceedings of the 2017 11th joint meeting on …, 2017 - dl.acm.org
Correct error handling is essential for building reliable and secure systems. Unfortunately,
low-level languages like C often do not support any error handling primitives and leave it up …