Response of HPC hardware to neutron radiation at the dawn of exascale

A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
Every computation presents a small chance that an unexpected phenomenon ruins or
modifies its output. Computers are prone to errors that, although may be very unlikely, are …

Straggler-Tolerant Stationary Methods for Linear Systems

V Kalantzis, Y Xi, L Horesh, Y Saad - SIAM Journal on Scientific Computing, 2025 - SIAM
In this paper, we consider the iterative solution of sparse systems of linear algebraic
equations under the condition that sparse matrix-vector products with the coefficient matrix …

Error resilience of three GMRES implementations under fault injection

JA Moríñigo, A Bustos, R Mayo-García - The Journal of Supercomputing, 2022 - Springer
The resilience behavior of three GMRES prototyped implementations (with Incomplete LU,
Flexible and randomized-SVD—based preconditioners) has been analyzed with a soft …

[PDF][PDF] Randomized linear solvers for computational architectures with straggling workers

V Kalantzis, Y Xi, L Horesh, Y Saad - arXiv e-prints, 2024 - researchgate.net
In this paper, we consider the iterative solution of sparse systems of linear algebraic
equations under the condition that sparse matrix-vector products with the coefficient matrix …

Resilience for asynchronous iterative methods for sparse linear systems

E Coleman - 2019 - search.proquest.com
Large scale simulations are used in a variety of application areas in science and
engineering to help forward the progress of innovation. Many spend the vast majority of their …

Impacts of three soft-fault models on hybrid parallel asynchronous iterative methods

E Coleman, EJ Jensen… - 2018 30th International …, 2018 - ieeexplore.ieee.org
This study seeks to understand the soft error vulnerability of asynchronous iterative methods,
with a focus on stationary iterative solvers such as Jacobi. The implementations make use of …

[PDF][PDF] Simulation Framework for Asynchronous Iterative Methods

EC Coleman, E Jensen… - Journal of Simulation …, 2018 - articles.jsime.org
As high-performance computing (HPC) platforms progress towards exascale, computational
methods must be revamped to successfully leverage them. In particular,(1) asynchronous …