GPU devices for safety-critical systems: A survey

J Perez-Cerrolaza, J Abella, L Kosmidis… - ACM Computing …, 2022 - dl.acm.org
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …

Evaluation and mitigation of soft-errors in neural network-based object detection in three GPU architectures

FF dos Santos, L Draghetti, L Weigel… - 2017 47th Annual …, 2017 - ieeexplore.ieee.org
In this paper, we evaluate the reliability of the You Only Look Once (YOLO) object detection
framework. We have exposed to controlled neutron beams GPUs designed with three …

Experimental and analytical study of xeon phi reliability

D Oliveira, L Pilla, N DeBardeleben… - Proceedings of the …, 2017 - dl.acm.org
We present an in-depth analysis of transient faults effects on HPC applications in Intel Xeon
Phi processors based on radiation experiments and high-level fault injection. Besides …

GPGPUs: How to combine high computational power with high reliability

LB Gomez, F Cappello, L Carro… - … , Automation & Test …, 2014 - ieeexplore.ieee.org
GPGPUs are used increasingly in several domains, from gaming to different kinds of
computationally intensive applications. In many applications GPGPU reliability is becoming …

Code-dependent and architecture-dependent reliability behaviors

V Fratin, D Oliveira, C Lunardi, F Santos… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
The increased need for computing capabilities and higher efficiency have stimulated
industries to make available in the market novel architectures with increased complexity …

On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability

C Lunardi, F Previlon, D Kaeli… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Using error-correcting codes (ECCs) is considered one of the most effective ways to mask
the effects of radiation-induced faults in memory and computing devices. Unfortunately, with …

Radiation-induced error criticality in modern HPC parallel accelerators

DAG De Oliveira, LL Pilla, M Hanzich… - … Symposium on High …, 2017 - ieeexplore.ieee.org
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-
Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a …

A-ABFT: Autonomous algorithm-based fault tolerance for matrix multiplications on graphics processing units

C Braun, S Halder… - 2014 44th Annual IEEE …, 2014 - ieeexplore.ieee.org
Graphics processing units (GPUs) enable large-scale scientific applications and simulations
on the desktop. To allow scientific computing on GPUs with high performance and reliability …

Kernel and layer vulnerability factor to evaluate object detection reliability in GPUs

F Fernandes dos Santos, L Carro… - IET Computers & Digital …, 2019 - Wiley Online Library
Video recognition applications running on Graphics Processing Unit are composed of
heterogeneous software portions, such as kernels or layers for neural networks. The authors …

On the evaluation of soft-errors detection techniques for GPGPUs

D Sabena, MS Reorda, L Sterpone… - 2013 8th IEEE Design …, 2013 - ieeexplore.ieee.org
Recently, General Purpose Graphic Processing Units (GPGPUs) have begun to be preferred
to CPUs for several computationally intensive applications, not necessarily related to …