Spatial support vector regression to detect silent errors in the exascale era

O Subasi, S Di, L Bautista-Gomez… - 2016 16th IEEE/ACM …, 2016 - ieeexplore.ieee.org
As the exascale era approaches, the increasing capacity of high-performance computing
(HPC) systems with targeted power and energy budget goals introduces significant …

Exploring the capabilities of support vector machines in detecting silent data corruptions

O Subasi, S Di, L Bautista-Gomez… - … Informatics and Systems, 2018 - Elsevier
As the exascale era approaches, the increasing capacity of high-performance computing
(HPC) systems with targeted power and energy budget goals introduces significant …

MACORD: online adaptive machine learning framework for silent error detection

O Subasi, S Di, P Balaprakash, O Unsal… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Future high-performance computing (HPC) systems with ever-increasing resource capacity
(such as compute cores, memory and storage) may significantly increase the risks on …

Soft error detection for iterative applications using offline training

J Liu, G Agrawal - 2016 IEEE 23rd International Conference on …, 2016 - ieeexplore.ieee.org
Silent data corruption (SDC) from soft errors is one of the challenges for Exascale systems
as the number of cores is increasing and the feature size is decreasing. In recent years, a …

Exploiting spatial smoothness in HPC applications to detect silent data corruption

L Bautista-Gomez, F Cappello - 2015 IEEE 17th International …, 2015 - ieeexplore.ieee.org
Next-generation supercomputers are expected to have more components and, at the same
time, consume several times less energy per operation. This situation is pushing …

An efficient silent data corruption detection method with error-feedback control and even sampling for HPC applications

S Di, E Berrocal, F Cappello - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org
The silent data corruption (SDC) problem is attracting more and more attentions because it
is expected to have a great impact on exascale HPC applications. SDC faults are hazardous …

Neural network based silent error detector

C Wang, N Dryden, F Cappello… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
As we move toward exascale platforms, silent data corruptions (SDC) are likely to occur
more frequently. Such errors can lead to incorrect results. Attempts have been made to use …

LADR: Low-cost application-level detector for reducing silent output corruptions

C Chen, G Eisenhauer, M Wolf, S Pande - Proceedings of the 27th …, 2018 - dl.acm.org
Applications running on future high performance computing (HPC) systems are more likely
to experience transient faults due to technology scaling trends with respect to higher circuit …

Estimating the failures and silent errors rates of cpus across isas and microarchitectures

D Gizopoulos, G Papadimitriou… - … IEEE International Test …, 2023 - ieeexplore.ieee.org
Silent data corruptions (SDCs) pose a significant challenge to the reliable operation of
modern microprocessors. As the need for enhanced performance and reliability continues to …

Lightweight silent data corruption detection based on runtime data analysis for HPC applications

E Berrocal, L Bautista-Gomez, S Di, Z Lan… - Proceedings of the 24th …, 2015 - dl.acm.org
Next-generation supercomputers are expected to have more components and, at the same
time, consume several times less energy per operation. Consequently, the number of soft …