Predictive reliability and fault management in exascale systems: State of the art and perspectives

R Canal, C Hernandez, R Tornero, A Cilardo… - ACM Computing …, 2020 - dl.acm.org
Performance and power constraints come together with Complementary Metal Oxide
Semiconductor technology scaling in future Exascale systems. Technology scaling makes …

A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?

L Bautista-Gomez, A Benoit, S Di, T Herault… - Future Generation …, 2024 - Elsevier
Abstract The Young/Daly formula provides an approximation of the optimal checkpointing
period for a parallel application executing on a supercomputing platform. It was originally …

Prediction of nanofluids viscosity using random forest (RF) approach

M Gholizadeh, M Jamei, I Ahmadianfar… - … and Intelligent Laboratory …, 2020 - Elsevier
Accurate estimation of viscosity, one of the most important thermo-physical properties of
nanofluids, is essential in heat transfer fluid applications in many industries. In this paper, for …

Determining Philippine coconut maturity level using machine learning algorithms based on acoustic signal

JA Caladcad, S Cabahug, MR Catamco… - … and electronics in …, 2020 - Elsevier
Advanced intelligent systems are becoming significant to many sectors, including farming. In
agriculture, the intelligent classification of post-harvested fruits seems to have a direct impact …

Understanding a program's resiliency through error propagation

Z Li, H Menon, K Mohror, PT Bremer, Y Livant… - Proceedings of the 26th …, 2021 - dl.acm.org
Aggressive technology scaling trends have worsened the transient fault problem in high-
performance computing (HPC) systems. Some faults are benign, but others can lead to silent …

Detection of Unit of Measure Inconsistency in gas turbine sensors by means of Support Vector Machine classifier

L Manservigi, D Murray, JA de la Iglesia, GF Ceschini… - ISA transactions, 2022 - Elsevier
The reliability of gas turbine diagnostics clearly relies on reliable measurements. However,
raw data reliability can be corrupted by label noise issues, as for instance an erroneous …

Response of HPC hardware to neutron radiation at the dawn of exascale

A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
Every computation presents a small chance that an unexpected phenomenon ruins or
modifies its output. Computers are prone to errors that, although may be very unlikely, are …

Optimal Classifier to Detect Unit of Measure Inconsistency in Gas Turbine Sensors

L Manservigi, M Venturini, E Losi, G Bechini… - Machines, 2022 - mdpi.com
Label noise is a harmful issue that arises when data are erroneously labeled. Several label
noise issues can occur but, among them, unit of measure inconsistencies (UMIs) are …

Anomaly detection in scientific datasets using sparse representation

A Moon, M Kim, J Chen, SW Son - Proceedings of the First Workshop on …, 2023 - dl.acm.org
As the size and complexity of high-performance computing (HPC) systems keep growing,
scientists' ability to trust the data produced is paramount due to potential data corruption for …

Ground-truth prediction to accelerate soft-error impact analysis for iterative methods

BO Mutlu, G Kestor, A Cristal, O Unsal… - 2019 IEEE 26th …, 2019 - ieeexplore.ieee.org
Understanding the impact of soft errors on applications can be expensive. Often, it requires
an extensive error injection campaign involving numerous runs of the full application in the …