GPU devices for safety-critical systems: A survey

J Perez-Cerrolaza, J Abella, L Kosmidis… - ACM Computing …, 2022 - dl.acm.org
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …

Analyzing and increasing the reliability of convolutional neural networks on GPUs

FF dos Santos, PF Pimenta, C Lunardi… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
Graphics processing units (GPUs) are playing a critical role in convolutional neural networks
(CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments …

Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions

P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation

D Tiwari, S Gupta, J Rogers, D Maxwell… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Increase in graphics hardware performance and improvements in programmability has
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …

Evaluation and mitigation of radiation-induced soft errors in graphics processing units

DAGG de Oliveira, LL Pilla, T Santini… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Graphics processing units (GPUs) are increasingly attractive for both safety-critical and High-
Performance Computing applications. GPU reliability is a primary concern for both the …

GreenMM: energy efficient GPU matrix multiplication through undervolting

H Zamani, Y Liu, D Tripathy, L Bhuyan… - Proceedings of the ACM …, 2019 - dl.acm.org
The current trend of ever-increasing performance in scientific applications comes with
tremendous growth in energy consumption. In this paper, we present GreenMM framework …

Correcting soft errors online in fast fourier transform

X Liang, J Chen, D Tao, S Li, P Wu, H Li… - Proceedings of the …, 2017 - dl.acm.org
While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect
soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the …

Modern GPUs radiation sensitivity evaluation and mitigation through duplication with comparison

DAG Oliveira, P Rech, HM Quinn… - … on Nuclear Science, 2014 - ieeexplore.ieee.org
Graphics processing units (GPUs) are increasingly common in both safety-critical and high-
performance computing (HPC) applications. Some current supercomputers are composed of …

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs

G Leon, JM Badia, JA Belloch, A Lindoso… - The Journal of …, 2024 - Springer
Graphics processing units (GPUs) have become integral to embedded systems and
supercomputing centres due to their large memory, cutting-edge technology and high …

GPGPUs ECC efficiency and efficacy

DAG Oliveira, P Rech, LL Pilla… - … on Defect and Fault …, 2014 - ieeexplore.ieee.org
In this paper we assess and discuss the efficiency and overhead of the Error-Correcting
Code (ECC) mechanism available on modern GPGPUs, which are increasingly used for …