GPU devices for safety-critical systems: A survey

J Perez-Cerrolaza, J Abella, L Kosmidis… - ACM Computing …, 2022 - dl.acm.org
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …

A survey on multithreading alternatives for soft error fault tolerance

I Oz, S Arslan - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce
higher soft error rates. This trend makes reliability a primary design constraint for computer …

Warped-slicer: Efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming

Q Xu, H Jeon, K Kim, WW Ro… - ACM SIGARCH Computer …, 2016 - dl.acm.org
As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of
computing resources to support thread-level parallelism. But even with the best effort …

Warped-compression: Enabling power efficient GPUs through register compression

S Lee, K Kim, G Koo, H Jeon, WW Ro… - ACM SIGARCH …, 2015 - dl.acm.org
This paper presents Warped-Compression, a warp-level register compression scheme for
reducing GPU power consumption. This work is motivated by the observation that the …

Understanding error propagation in GPGPU applications

G Li, K Pattabiraman, CY Cher… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
GPUs have emerged as general-purpose accelerators in high-performance computing
(HPC) and scientific applications. However, the reliability characteristics of GPU applications …

Optimizing software-directed instruction replication for gpu error detection

A Mahmoud, SKS Hari, MB Sullivan… - … Conference for High …, 2018 - ieeexplore.ieee.org
Application execution on safety-critical and high-performance computer systems must be
resilient to transient errors. As GPUs become more pervasive in such systems, they must …

Hi-fi playback: Tolerating position errors in shift operations of racetrack memory

C Zhang, G Sun, X Zhang, W Zhang, W Zhao… - Proceedings of the …, 2015 - dl.acm.org
Racetrack memory is an emerging non-volatile memory based on spintronic domain wall
technology. It can achieve ultra-high storage density. Also, its read/write speed is …

Real-world design and evaluation of compiler-managed GPU redundant multithreading

J Wadden, A Lyashevsky, S Gurumurthi… - ACM SIGARCH …, 2014 - dl.acm.org
Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in
the construction of reliable supercomputer systems. Because hardware protection is …

Mascar: Speeding up GPU warps by reducing memory pitstops

A Sethia, DA Jamshidi, S Mahlke - 2015 IEEE 21st International …, 2015 - ieeexplore.ieee.org
With the prevalence of GPUs as throughput engines for data parallel workloads, the
landscape of GPU computing is changing significantly. Non-graphics workloads with high …

Warped gates: Gating aware scheduling and power gating for gpgpus

M Abdel-Majeed, D Wong, M Annavaram - … of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org
With the widespread adoption of GPGPUs in varied application domains, new opportunities
open up to improve GPGPU energy efficiency. Due to inherent application-level …