Fault injection analytics: A novel approach to discover failure modes in cloud-computing systems

D Cotroneo, L De Simone, P Liguori… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Cloud computing systems fail in complex and unexpected ways due to unexpected
combinations of events and interactions between hardware and software components. Fault …

Evaluating the Effectiveness of Microarchitectural Hardware Fault Detection for Application-Specific Requirements

KN Papadopoulos, C Giannoula… - arXiv preprint arXiv …, 2024 - arxiv.org
Reliability is necessary in safety-critical applications spanning numerous domains.
Conventional hardware-based fault tolerance techniques, such as component redundancy …

Anatomy of on-chip memory hardware fault effects across the layers

G Papadimitriou, D Gizopoulos - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Reliability evaluation of a microprocessor design may reveal vulnerable silicon areas that
require protection against faults, but also hardware structures that are inherently more …

Low-cost and efficient fault detection and diagnosis schemes for modern cores

JS Carretero Casado - 2015 - upcommons.upc.edu
Continuous improvements in transistor scaling together with microarchitectural advances
have made possible the widespread adoption of high-performance processors across all …

Eris: Fault Injection and Tracking Framework for Reliability Analysis of Open-Source Hardware

S Nema, J Kirschner, D Adak, S Agarwal… - … Analysis of Systems …, 2022 - ieeexplore.ieee.org
As transistors have been scaled over the past decade, modern systems have become
increasingly susceptible to faults. Increased transistor densities and lower capacitances …

Extensive evaluation of programming models and ISAs impact on multicore soft error reliability

F Rosa, V Bandeira, R Reis, L Ost - Proceedings of the 55th Annual …, 2018 - dl.acm.org
To take advantage of the performance enhancements provided by multicore processors,
new instruction set architectures (ISAs) and parallel programming libraries have been …

[HTML][HTML] SOFIA: An automated framework for early soft error assessment, identification, and mitigation

J Gava, V Bandeira, F Rosa, R Garibotti, R Reis… - Journal of Systems …, 2022 - Elsevier
The occurrence of radiation-induced soft errors in electronic computing systems can either
affect non-essential system functionalities or violate safety–critical conditions, which might …

Application health monitoring for extreme‐scale resiliency using cooperative fault management

PK Agarwal, T Naughton, BH Park… - Concurrency and …, 2020 - Wiley Online Library
Resiliency is and will be a critical factor in determining scientific productivity on current and
exascale supercomputers, and beyond. Applications oblivious to and incapable of handling …

[图书][B] Soft error modeling and analysis for microprocessors

X Li - 2008 - search.proquest.com
Soft errors are a growing concern for processor reliability. Recent work has motivated
architecture level studies of soft errors since the architecture level can mask many raw errors …

Cross-Layer Fault Analysis for Microprocessor Architectures (CLAM)

I Alshaer - 2023 - theses.hal.science
With the widespread use of embedded system devices, hardware designers and software
developers started paying more attention to security issues in order to protect these devices …