Advanced introduction to spatial statistics

DA Griffith, B Li - 2022 - books.google.com
This Advanced Introduction provides a critical review and discussion of research concerning
spatial statistics, differentiating between it and spatial econometrics, to answer a set of core …

A runtime heuristic to selectively replicate tasks for application-specific reliability targets

O Subasi, G Yalcin, F Zyulkyarov… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
In this paper we propose a runtime-based selective task replication technique for task-
parallel high performance computing applications. Our selective task replication technique is …

A gaussian process approach for effective soft error detection

O Subasi, S Krishnamoorthy - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
In this paper, we present a non-parametric dataanalytic soft-error detector. Our detector uses
the key properties of Gaussian process regression. First, because Gaussian process …

FPDetect Efficient Reasoning About Stencil Programs Using Selective Direct Evaluation

A Das, S Krishnamoorthy, I Briggs… - ACM Transactions on …, 2020 - dl.acm.org
We present FPDetect, a low-overhead approach for detecting logical errors and soft errors
affecting stencil computations without generating false positives. We develop an offline …

Mitigation of failures in high performance computing via runtime techniques

X Ni - 2016 - ideals.illinois.edu
As machines increase in scale, it is predicted that failure rates of supercomputers will
correspondingly increase. Even though the mean time to failure (MTTF) of individual …

Understanding and improving the trust in results of numerical simulations and scientific data analytics

F Cappello, R Gupta, S Di, E Constantinescu… - Euro-Par 2017: Parallel …, 2018 - Springer
With ever-increasing execution scale of parallel scientific simulations, potential unnoticed
corruptions to scientific data during simulation make users more suspicious about the …

Reliability for exascale computing: system modelling and error mitigation for task-parallel HPC applications

O Subasi - 2016 - upcommons.upc.edu
As high performance computing (HPC) systems continue to grow, their fault rate increases.
Applications running on these systems have to deal with rates on the order of hours or days …

An extensive study on iterative solver resilience: characterization, detection and prediction

BO Mutlu - 2019 - upcommons.upc.edu
Soft errors caused by transient bit flips have the potential to significantly impactan
applicalion's behavior. This has motivated the design of an array of techniques to detect …

Exploiting task-based programming models for resilience

L Jaulmes - 2019 - upcommons.upc.edu
Hardware errors become more common as silicon technologies shrink and become more
vulnerable, especially in memory cells, which are the most exposed to errors. Permanent …

FailAmp: Relativization transformation for soft error detection in structured address generation

I Briggs, A Das, M Baranowski, V Sharma… - ACM Transactions on …, 2019 - dl.acm.org
We present FailAmp, a novel LLVM program transformation algorithm that makes programs
employing structured index calculations more robust against soft errors. Without FailAmp, an …