An efficient silent data corruption detection method with error-feedback control and even sampling for HPC applications

S Di, E Berrocal, F Cappello - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org
HPC application traces and real runs on the Argonne Fusion cluster. Experiments show that
our error feedback … In Section IV, we describe our error feedback control model in the context …

Evaluating and accelerating high-fidelity error injection for hpc

CK Chang, S Lym, N Kelly… - … for high performance …, 2018 - ieeexplore.ieee.org
… overhead (see Section VB), at the same time providing application developers useful
feedback regarding resilience from a higher-level point of view. Our methodology and pluggable …

Pshifter: Feedback-based dynamic power shifting within hpc jobs for performance

N Gholkar, F Mueller, B Rountree… - Proceedings of the 27th …, 2018 - dl.acm.org
… Control Signal Calculator: The error (err) in the system state is calculated as the diference
… The feedback controller calculates the feedback to the actuator as a function of this error. It …

Predictable high-performance computing using feedback control and admission control

SM Park, M Humphrey - IEEE Transactions on Parallel and …, 2010 - ieeexplore.ieee.org
… the successful application of classic control theory to HPC … to the requested deadlines (12.4
percent error). The rest of the … the performance modeling of HPC applications. Section 4 …

Fliptracker: Understanding natural error resilience in hpc applications

L Guo, D Li, I Laguna, M Schulz - … High Performance Computing …, 2018 - ieeexplore.ieee.org
… tracking of error propagation and resilience properties, and we use it to present a set of
computation patterns that are responsible for making representative HPC applications naturally …

Monitoring HPC applications in the production environment

H Sharifi, O Aaziz, J Cook - … Programming for Analytics Applications, 2015 - dl.acm.org
… Software and hardware fault tolerance, scaling performance issues, soft error effect on … the
performance feedback to the Analyzer. This component analyzes the feedback and generates …

Autotuning in high-performance computing applications

P Balaprakash, J Dongarra, T Gamblin… - Proceedings of the …, 2018 - ieeexplore.ieee.org
HPC application developers to develop autotuning technology that meets these goals and is
compatible with the development of HPC … For tuning tools that rely on dynamic feedback at …

Preparing HPC applications for exascale: Challenges and recommendations

E Abraham, C Bekas, I Brandic… - … on Network-Based …, 2015 - ieeexplore.ieee.org
… At runtime, the model is a posteriori tuned to support activities such as feedbackoriented …
These issues are addressed typically by programmers in a “trial and error” manner, ie by …

Adaptive impact-driven detection of silent data corruption for HPC applications

S Di, F Cappello - IEEE Transactions on Parallel and …, 2016 - ieeexplore.ieee.org
… 4.2 Error Feedback Prediction In our detector, we use curve … error feedback model that
can significantly simplify the prediction. We also demonstrate the identity between the feedback

Bayesperf: minimizing performance monitoring errors using bayesian statistics

SS Banerjee, S Jha, Z Kalbarczyk, RK Iyer - Proceedings of the 26th …, 2021 - dl.acm.org
… are untenable in emergent applications that use HPCs as inputs to complete a feedback loop
and … In this paper, we define HPC error as magnitude of difference between corresponding …