A comprehensive detection of memory corruption vulnerabilities for C/C++ programs

Y Gao, L Chen, G Shi, F Zhang - … IEEE Intl Conf on Parallel & …, 2018 - ieeexplore.ieee.org
Memory corruption bugs in software written in low-level languages like C or C++ are one of
the oldest problems in computer security. These unsafe languages are vulnerable to errors …

Albadross: Active learning based anomaly diagnosis for production hpc systems

B Aksar, E Sencan, B Schwaller, O Aaziz… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Diagnosing causes of performance variations in High-Performance Computing (HPC)
systems is a daunting chal-lenge due to the systems' scale and complexity. Variations in …

Dr. DNA: Combating Silent Data Corruptions in Deep Learning using Distribution of Neuron Activations

D Ma, F Lin, A Desmaison, J Coburn, D Moore… - Proceedings of the 29th …, 2024 - dl.acm.org
Deep neural networks (DNNs) have been widely-adopted in various safety-critical
applications such as computer vision and autonomous driving. However, as technology …

Detecting errors using multi-cycle invariance information

N Alves, K Nepal, J Dworak… - 2009 Design, Automation …, 2009 - ieeexplore.ieee.org
Ensuring reliable computation at the nanoscale requires mechanisms to detect and correct
errors during normal circuit operation. In this paper we propose a method for designing …

Vulnerability analysis of instructions for SDC-causing error detection

J Gu, W Zheng, Y Zhuang, Q Zhang - IEEE Access, 2019 - ieeexplore.ieee.org
Due to the centralization of communication in the management of data generated by diverse
Internet of Thing (IoT) devices, there is a lack of reliability when data is being transferred and …

Understanding Permanent Hardware Failures in Deep Learning Training Accelerator Systems

Y He, Y Li - 2023 IEEE European Test Symposium (ETS), 2023 - ieeexplore.ieee.org
Hardware failures pose critical threats to deep neural network (DNN) training workloads,
and the urgency of tackling this challenge (known as the Silent Data Corruption challenge in …

Framework for economical error recovery in embedded cores

G Upasani, X Vera, A Gonzalez - 2014 IEEE 20th International …, 2014 - ieeexplore.ieee.org
The vulnerability of the current and future processors towards transient errors caused by
particle strikes is expected to increase rapidly because of exponential growth rate of on-chip …

Ml-based online design error localization for risc-v implementations

H Selg, M Jenihhin, P Ellervee… - 2023 IEEE 29th …, 2023 - ieeexplore.ieee.org
The accelerated growth of computing systems' complexity makes comprehensive design
verification challenging and time-consuming. In practice, hard-to-model complex …

Understanding and analyzing interconnect errors and network congestion on a large scale HPC system

M Kumar, S Gupta, T Patel, M Wilder… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
Today's High Performance Computing (HPC) systems are capable of delivering
performance in the order of petaflops due to the fast computing devices, network …

On algorithms selection for unsupervised anomaly detection

T Zoppi, A Ceccarelli… - 2018 IEEE 23rd Pacific Rim …, 2018 - ieeexplore.ieee.org
Anomaly detection, which aims at identifying unexpected trends and data patterns, has
widely been used to build error detectors, failure predictors or intrusion detectors. Internal …