GPU devices for safety-critical systems: A survey

J Perez-Cerrolaza, J Abella, L Kosmidis… - ACM Computing …, 2022 - dl.acm.org
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …

Error characterization, mitigation, and recovery in flash-memory-based solid-state drives

Y Cai, S Ghose, EF Haratsch, Y Luo… - Proceedings of the …, 2017 - ieeexplore.ieee.org
NAND flash memory is ubiquitous in everyday life today because its capacity has
continuously increased and cost has continuously decreased over decades. This positive …

A survey of techniques for modeling and improving reliability of computing systems

S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences
and impact of faults in computing systems. This has madereliability'a first-order design …

Exploiting correcting codes: On the effectiveness of ecc memory against rowhammer attacks

L Cojocar, K Razavi, C Giuffrida… - 2019 IEEE Symposium …, 2019 - ieeexplore.ieee.org
Given the increasing impact of Rowhammer, and the dearth of adequate other hardware
defenses, many in the security community have pinned their hopes on error-correcting code …

Memory errors in modern systems: The good, the bad, and the ugly

V Sridharan, N DeBardeleben, S Blanchard… - ACM SIGARCH …, 2015 - dl.acm.org
Several recent publications have shown that hardware faults in the memory subsystem are
commonplace. These faults are predicted to become more frequent in future systems that …

Addressing failures in exascale computing

M Snir, RW Wisniewski, JA Abraham… - … Journal of High …, 2014 - journals.sagepub.com
We present here a report produced by a workshop on 'Addressing failures in exascale
computing'held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to …

CSI: Rowhammer–Cryptographic security and integrity against rowhammer

J Juffinger, L Lamster, A Kogler… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org
In this paper, we present CSI: Rowhammer, a principled hardware-software co-design
Rowhammer mitigation with cryptographic security and integrity guarantees, that does not …

Understanding latency variation in modern DRAM chips: Experimental characterization, analysis, and optimization

KK Chang, A Kashyap, H Hassan, S Ghose… - Proceedings of the …, 2016 - dl.acm.org
Long DRAM latency is a critical performance bottleneck in current systems. DRAM access
latency is defined by three fundamental operations that take place within the DRAM cell …

[图书][B] Fault tolerance techniques for high-performance computing

J Dongarra, T Herault, Y Robert - 2015 - Springer
This chapter provides an introduction to resilience methods. The emphasis is on
checkpointing, the de-facto standard technique for resilience in High Performance …

Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field

J Meza, Q Wu, S Kumar, O Mutlu - 2015 45th Annual IEEE/IFIP …, 2015 - ieeexplore.ieee.org
Computing systems use dynamic random-access memory (DRAM) as main memory. As
prior works have shown, failures in DRAM devices are an important source of errors in …