Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …

Impact of voltage scaling on soft errors susceptibility of multicore server cpus

D Agiakatsikas, G Papadimitriou, V Karakostas… - Proceedings of the 56th …, 2023 - dl.acm.org
Microprocessor power consumption and dependability are both crucial challenges that
designers have to cope with due to shrinking feature sizes and increasing transistor counts …

Gem5-marvel: Microarchitecture-level resilience analysis of heterogeneous soc architectures

O Chatzopoulos, G Papadimitriou… - … Symposium on High …, 2024 - ieeexplore.ieee.org
In this paper, we present gem5-MARVEL, the first consolidated microarchitecture-level fault
injection infrastructure for heterogeneous System-on-Chip architectures comprising CPUs of …

Silent data corruptions: The stealthy saboteurs of digital integrity

G Papadimitriou, D Gizopoulos… - 2023 IEEE 29th …, 2023 - ieeexplore.ieee.org
Silent Data Corruptions (SDCs) pose a significant threat to the integrity of digital systems.
These stealthy saboteurs silently corrupt data, remaining undetected by traditional error …

Soft Error Resilience at Near-Zero Cost

J Zeng, SY Huang, J Liu, C Jung - Proceedings of the 38th ACM …, 2024 - dl.acm.org
Among existing schemes for soft error resilience, acoustic-sensor-based detection stands
out owing to its ability to prevent silent data corruption at low hardware cost. However, the …

Estimating the failures and silent errors rates of cpus across isas and microarchitectures

D Gizopoulos, G Papadimitriou… - … IEEE International Test …, 2023 - ieeexplore.ieee.org
Silent data corruptions (SDCs) pose a significant challenge to the reliable operation of
modern microprocessors. As the need for enhanced performance and reliability continues to …

gemV-tool: A Comprehensive Soft Error Reliability Estimation Tool for Design Space Exploration

H So, Y Ko, J Jung, K Lee, A Shrivastava - Electronics, 2023 - mdpi.com
With aggressive technology scaling, soft errors have become a major threat in modern
computing systems. Several techniques have been proposed in the literature and …

An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment

HA Ahmad, Y Sedaghat - Future Generation Computer Systems, 2024 - Elsevier
The recent trend in most processor manufacturing technologies has significantly increased
the vulnerability of embedded systems operating in harsh environments against soft errors …

BiGResi: Robust bit-level fault injection framework for assessing intrinsic software resilience against soft errors

HA Ahmad, Y Sedaghat - Computers and Electrical Engineering, 2024 - Elsevier
Radiation-induced soft errors, despite rare, pose a significant threat to the reliability of
systems. Assessing the intrinsic resilience of software to soft errors is therefore essential for …

Silent Data Corruptions in Computing Systems: Early Predictions and Large-Scale Measurements

D Gizopoulos, G Papadimitriou… - 2024 IEEE European …, 2024 - ieeexplore.ieee.org
Silent Data Corruptions (SDCs) due to defects in computing chips (CPUs, GPUs, AI
accelerators) is a critical threat to the quality of large-scale computing in different application …