Evaluating operating system vulnerability to memory errors

KB Ferreira, K Pedretti, R Brightwell… - Proceedings of the 2nd …, 2012 - dl.acm.org
Reliability is of great concern to the scalability of extreme-scale systems. Of particular
concern are soft errors in main memory, which are a leading cause of failures on current …

Design, use and evaluation of p-fsefi: A parallel soft error fault injection framework for emulating soft errors in parallel applications

Q Guan, N BeBardeleben, P Wu, S Eidenbenz… - Proceedings of the 9th …, 2016 - dl.acm.org
Future exascale application programmers and users have a need to quantity an
application's resilience and vulnerability to soft errors before running their codes on …

Exploring the impact of soft errors on NoC-based multiprocessor systems

FT Bortolon, G Abich, S Bampi, R Reis… - … on Circuits and …, 2018 - ieeexplore.ieee.org
Software reliability is an essential design metric in emerging large-scale multiprocessor
embedded systems. Designers should identify soft error susceptibility of multiple …

gem5-FIM: a flexible and scalable multicore soft error assessment framework to early reliability design space explorations

FR Da Rosa, R Reis, L Ost - 2018 IEEE 9th Latin American …, 2018 - ieeexplore.ieee.org
Increasing chip power densities allied to the continuous technology shrink are making
emerging multiprocessor embedded systems more vulnerable to radiation-induced transient …

SEU Reliability Assessment Framework for COTS Many-core Processors

C Dammak, OA Mohamed… - … on Microelectronics (ICM …, 2022 - ieeexplore.ieee.org
The high level of performance intrinsic to many-core architectures has made them the
obvious successor to single-core processors in scenarios that require high computation …

On soft error reliability of virtualization infrastructure

X Xu, HH Huang - IEEE Transactions on Computers, 2016 - ieeexplore.ieee.org
Hardware errors are no longer exceptions in modern cloud data centers. Although
virtualization provides software failure isolation among different virtual machines (VM), the …

Fast Kernel Error Propagation Analysis in Virtualized Environments

N Coppik, O Schwahn, N Suri - 2021 14th IEEE Conference on …, 2021 - ieeexplore.ieee.org
Assessing operating system dependability remains a challenging problem, particularly in
monolithic systems. Component interfaces are not well-defined and boundaries are not …

Evaluation of compilers effects on OpenMP soft error resiliency

J Gava, V Bandeira, R Reis, L Ost - 2019 IEEE Computer …, 2019 - ieeexplore.ieee.org
Software engineers are using different compilers and parallel programming models (eg,
Pthreads, OpenMP) to take the best performance offered by multicore systems. Both …

[图书][B] Virtual lockstep for fault tolerance and architectural vulnerability analysis

CM Jeffery - 2009 - search.proquest.com
This dissertation presents a flexible technique that can be applied to commodity many-core
architectures to exploit idle resources and ensure reliable system operation. The proposed …

Fault characterization and mitigation strategies in desktop cloud systems

CE Gómez, J Chavarriaga, HE Castro - Latin American High Performance …, 2018 - Springer
Desktop cloud platforms, such as UnaCloud and CernVM, run clusters of virtual machines
taking advantage of idle resources on desktop computers. These platforms execute virtual …