Estimating the failures and silent errors rates of cpus across isas and microarchitectures

D Gizopoulos, G Papadimitriou… - … IEEE International Test …, 2023 - ieeexplore.ieee.org
Silent data corruptions (SDCs) pose a significant challenge to the reliable operation of
modern microprocessors. As the need for enhanced performance and reliability continues to …

Characterizing the impact of soft errors across microarchitectural structures and implications for predictability

B Wibowo, A Agrawal, J Tuck - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
The trends of transistor size and system complexity scaling continue. As a result, soft errors
in the system, including the processor core, are predicted to become one of the major …

Silent data errors: Sources, detection, and modeling

A Singh, S Chakravarty, G Papadimitriou… - 2023 IEEE 41st VLSI …, 2023 - ieeexplore.ieee.org
Chip manufacturers and hyperscalers are becoming increasingly aware of the problem
posed by Silent Data Errors (SDE) and are taking steps to address it. Major computing …

Silent data corruptions: The stealthy saboteurs of digital integrity

G Papadimitriou, D Gizopoulos… - 2023 IEEE 29th …, 2023 - ieeexplore.ieee.org
Silent Data Corruptions (SDCs) pose a significant threat to the integrity of digital systems.
These stealthy saboteurs silently corrupt data, remaining undetected by traditional error …

Assessing the impact of hard faults in performance components of modern microprocessors

N Foutris, D Gizopoulos… - 2013 IEEE 31st …, 2013 - ieeexplore.ieee.org
A growing portion of the silicon area of modern high-performance microprocessors is
dedicated to components that increase performance but do not determine functional …

Spatial support vector regression to detect silent errors in the exascale era

O Subasi, S Di, L Bautista-Gomez… - 2016 16th IEEE/ACM …, 2016 - ieeexplore.ieee.org
As the exascale era approaches, the increasing capacity of high-performance computing
(HPC) systems with targeted power and energy budget goals introduces significant …

Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …

MACORD: online adaptive machine learning framework for silent error detection

O Subasi, S Di, P Balaprakash, O Unsal… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Future high-performance computing (HPC) systems with ever-increasing resource capacity
(such as compute cores, memory and storage) may significantly increase the risks on …

Performance-aware reliability assessment of heterogeneous chips

A Chatzidimitriou, M Kaliorakis… - 2017 IEEE 35th VLSI …, 2017 - ieeexplore.ieee.org
Technology evolution has raised serious reliability considerations, as transistor dimensions
shrink and modern microprocessors become denser and more vulnerable to faults …

An efficient silent data corruption detection method with error-feedback control and even sampling for HPC applications

S Di, E Berrocal, F Cappello - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org
The silent data corruption (SDC) problem is attracting more and more attentions because it
is expected to have a great impact on exascale HPC applications. SDC faults are hazardous …