Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …

On-Chip Bus Protection against Soft Errors

J Mach, L Kohútka, P Čičák - Electronics, 2023 - mdpi.com
The increasing performance demands for processors leveraged in mission and safety-
critical applications mean that the processors are implemented in smaller fabrication …

[HTML][HTML] Understanding fault-tolerance vulnerabilities in advanced SoC FPGAs for critical applications

N Cherezova, K Shibin, M Jenihhin, A Jutman - Microelectronics Reliability, 2023 - Elsevier
The emergence of heterogeneous FPGA-based SoCs and their growing complexity fueled
by the introduction of various accelerators bring the reliability aspect of these systems to the …

Harpocrates: Breaking the silence of cpu faults through hardware-in-the-loop program generation

N Karystinos, O Chatzopoulos… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Several hyperscalers have recently disclosed the occurrence of Silent Data Corruptions
(SDCs) in their systems fleets, sparking concerns about the severity of known and the …

Braum: Analyzing and protecting autonomous machine software stack

Y Gan, P Whatmough, J Leng, B Yu… - 2022 IEEE 33rd …, 2022 - ieeexplore.ieee.org
Autonomous machines, such as Autonomous Vehicles (AV), are vulnerable to a variety of
different faults such as radiation-induced soft/transient errors, adversarial attacks, and …

Radiation testing of a multiprocessor macrosynchronized lockstep architecture with FreeRTOS

PM Aviles, A Lindoso, JA Belloch… - … on Nuclear Science, 2021 - ieeexplore.ieee.org
Nowadays, high-performance microprocessors are demanded in many fields, including
those with high-reliability requirements. Commercial microprocessors present a good …

Impact of transient faults on timing behavior and mitigation with near-zero wcet overhead

PR Nikiema, A Kritikakou, M Traiola… - ECRTS 2023-35th …, 2023 - hal.science
As time-critical systems require timing guarantees, Worst-Case Execution Times (WCET)
have to be employed. However, WCET estimation methods usually assume fault-free …

[HTML][HTML] Novel lockstep-based fault mitigation approach for SoCs with roll-back and roll-forward recovery

S Kasap, EW Wächter, X Zhai, S Ehsan… - Microelectronics …, 2021 - Elsevier
Abstract All-Programmable System-on-Chips (APSoCs) constitute a compelling option for
employing applications in radiation environments thanks to their high-performance …

The Vulnerability-Adaptive Protection Paradigm

Z Wan, Y Gan, B Yu, S Liu, A Raychowdhury… - Communications of the …, 2024 - dl.acm.org
The Vulnerability-Adaptive Protection Paradigm | Communications of the ACM skip to main
content ACM Digital Library home ACM Association for Computing Machinery corporate …

Lock-V: A heterogeneous fault tolerance architecture based on Arm and RISC-V

I Marques, C Rodrigues, A Tavares, S Pinto… - Microelectronics …, 2021 - Elsevier
This article presents Lock-V, a heterogeneous fault tolerance architecture that explores a
dual-core lockstep (DCLS) technique to mitigate single event upset (SEU) and common …