作者
Omer Subasi, Sheng Di, Leonardo Bautista-Gomez, Prasanna Balaprakash, Osman Unsal, Jesus Labarta, Adrian Cristal, Sriram Krishnamoorthy, Franck Cappello
发表日期
2018/9/1
期刊
Sustainable Computing: Informatics and Systems
卷号
19
页码范围
277-290
出版商
Elsevier
简介
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs), or silent errors, are one of the major sources that corrupt the execution results of HPC applications without being detected.
In this work, we explore a set of novel SDC detectors – by leveraging epsilon-insensitive support vector machine regression – to detect SDCs that occur in HPC applications. The key contributions are threefold. (1) Our exploration takes temporal, spatial, and spatiotemporal features into account and analyzes different detectors based on different features. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show that …
引用总数
2019202020212022202326432
学术搜索中的文章
O Subasi, S Di, L Bautista-Gomez, P Balaprakash… - Sustainable Computing: Informatics and Systems, 2018