Next-generation supercomputers are expected to have more components and, at the same time, consume several times less energy per operation. Consequently, the number of soft …
S Di, F Cappello - IEEE Transactions on Parallel and …, 2016 - ieeexplore.ieee.org
For exascale HPC applications, silent data corruption (SDC) is one of the most dangerous problems because there is no indication that there are errors during the execution. We …
High-performance computing is a powerful tool that allows scientists to study complex natural phenomena. Extreme-scale supercomputers promise orders of magnitude higher …
As we move toward exascale platforms, silent data corruptions (SDC) are likely to occur more frequently. Such errors can lead to incorrect results. Attempts have been made to use …
S Di, E Berrocal, F Cappello - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org
The silent data corruption (SDC) problem is attracting more and more attentions because it is expected to have a great impact on exascale HPC applications. SDC faults are hazardous …
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant …
Next-generation supercomputers are expected to have more components and, at the same time, consume several times less energy per operation. This situation is pushing …
D Juedes, F Drews, L Welch… - … Parallel and Distributed …, 2004 - ieeexplore.ieee.org
Summary form only given. We examine several heuristic algorithms for the maximum allowable workload (MAW) problem for real-time systems with tasks having variable …
A Gainaru, F Cappello - Fault-Tolerance Techniques for High-Performance …, 2015 - Springer
Understanding the behavior of failures in large-scale systems is important in order to design techniques to tolerate them. Reliability knowledge of resources can be used in numerous …