From detection to optimization: impact of soft errors on high-performance computing applications

JC Calhoun - 2017 - ideals.illinois.edu
As high-performance computing (HPC) continues to progress, constraints on HPC system
design forces the handling of errors to higher levels in the software stack. Of the types of …

Fault-tolerant and energy-aware algorithms for workflows and real-time systems

L Han - 2020 - theses.hal.science
This thesis is focused on the two major problems in the high performance computing context:
resilience and energyconsumption. To satisfy the computing power required by modern …

Toward Resilient Exascale PDE Solvers Using the Combination Technique

A Parra Hinojosa - 2017 - mediatum.ub.tum.de
Future exascale computers will offer unprecedented performance gains, but their increased
complexity introduces new obstacles. System faults will likely affect parallel simulations on a …

A Probabilistic Software Framework for Scalable Data Storage and Integrity Check

S Xiong - 2017 - trace.tennessee.edu
Data has overwhelmed the digital world in terms of volume, variety and velocity. Data-
intensive applications are facing unprecedented challenges. On the other hand …

[PDF][PDF] Sirius: Probabilistic data assertions for detecting silent data corruption in parallel programs

TE Thomas, AJ Bhattad, S Mitra, S Bagchi - engineering.purdue.edu
The size and complexity of supercomputing clusters are rapidly increasing to cater to the
needs of complex scientific applications. At the same time, the feature size and operating …

[图书][B] Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems

S Levy - 2016 - search.proquest.com
High-performance computing (HPC) systems enable scientists to numerically model
complex phenomena in many important physical systems. The next major milestone in the …

Research on Venture Judgment System of Dahuofang Drinking Water Source Protection Area

M Xingguan, Z Chen - 2011 Fourth International Conference …, 2011 - ieeexplore.ieee.org
The risk assessment of drinking water source focused on quantitative assessment of
contaminants detected in drinking water on human health risks, but the potential …

[引用][C] Detecting soft errors in stencil based computations

[引用][C] Resilient scheduling algorithms for large-scale platforms

O Beaumont - 2020 - Université de Pittsburgh Rapporteur …

[引用][C] D3. 5: Runtime support for significance-sensitive, power-efficient checkpointing and localized restarts