A visual comparison of silent error propagation

Z Li, H Menon, K Mohror, S Liu, L Guo… - … on Visualization and …, 2022 - ieeexplore.ieee.org
High-performance computing (HPC) systems play a critical role in facilitating scientific
discoveries. Their scale and complexity (eg, the number of computational units and software …

Anomaly detection in scientific datasets using sparse representation

A Moon, M Kim, J Chen, SW Son - Proceedings of the First Workshop on …, 2023 - dl.acm.org
As the size and complexity of high-performance computing (HPC) systems keep growing,
scientists' ability to trust the data produced is paramount due to potential data corruption for …

Classification based survey of image registration methods

K Sharma, A Goyal - 2013 Fourth International Conference on …, 2013 - ieeexplore.ieee.org
Image registration technique is useful for variety of applications ranging from surveillance to
image mosaicing where task is to match two or more pictures taken, for example, at different …

Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction

T Benacchio, L Bonaventura… - … Journal of High …, 2021 - journals.sagepub.com
Progress in numerical weather and climate prediction accuracy greatly depends on the
growth of the available computing power. As the number of cores in top computing facilities …

Sensitivity of computational fluid dynamics simulations against soft errors

EF Yetkin, Ş Pişkin - Computing, 2021 - Springer
Computational capabilities of the largest high performance computing systems have
increased by more than 100 folds in the last 10 years and keep increasing substantially …

Toward general software level silent data corruption detection for parallel applications

E Berrocal, L Bautista-Gomez, S Di… - … on Parallel and …, 2017 - ieeexplore.ieee.org
Silent data corruption (SDC) poses a great challenge for high-performance computing
(HPC) applications as we move to extreme-scale systems. Mechanisms have been …

Sdc is in the eye of the beholder: A survey and preliminary study

B Fang, P Wu, Q Guan, N DeBardeleben… - 2016 46th Annual …, 2016 - ieeexplore.ieee.org
Silent data corruptions (SDCs) are one of the most critical issues in modern HPC systems,
as they are" silent" by definition and raise no warnings to users and application developers …

Multi-level checkpointing and silent error detection for linear workflows

A Benoit, A Cavelan, Y Robert, H Sun - Journal of computational science, 2018 - Elsevier
Abstract We focus on High Performance Computing (HPC) workflows whose dependency
graph forms a linear chain, and we extend single-level checkpointing in two important …

MACORD: online adaptive machine learning framework for silent error detection

O Subasi, S Di, P Balaprakash, O Unsal… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Future high-performance computing (HPC) systems with ever-increasing resource capacity
(such as compute cores, memory and storage) may significantly increase the risks on …

Exploring partial replication to improve lightweight silent data corruption detection for HPC applications

E Berrocal, L Bautista-Gomez, S Di, Z Lan… - Euro-Par 2016: Parallel …, 2016 - Springer
Silent data corruption (SDC) poses a great challenge for high-performance computing
(HPC) applications as we move to extreme-scale systems. If not dealt with properly, SDC …