Rowhammer: A retrospective

O Mutlu, JS Kim - … Transactions on Computer-Aided Design of …, 2019 - ieeexplore.ieee.org
This retrospective paper describes the RowHammer problem in dynamic random access
memory (DRAM), which was initially introduced by Kim et al. at the ISCA 2014 Conference …

Error characterization, mitigation, and recovery in flash-memory-based solid-state drives

Y Cai, S Ghose, EF Haratsch, Y Luo… - Proceedings of the …, 2017 - ieeexplore.ieee.org
NAND flash memory is ubiquitous in everyday life today because its capacity has
continuously increased and cost has continuously decreased over decades. This positive …

A survey of techniques for modeling and improving reliability of computing systems

S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences
and impact of faults in computing systems. This has madereliability'a first-order design …

Memory errors in modern systems: The good, the bad, and the ugly

V Sridharan, N DeBardeleben, S Blanchard… - ACM SIGARCH …, 2015 - dl.acm.org
Several recent publications have shown that hardware faults in the memory subsystem are
commonplace. These faults are predicted to become more frequent in future systems that …

A survey of high-performance computing scaling challenges

A Geist, DA Reed - The International Journal of High …, 2017 - journals.sagepub.com
Commodity clusters revolutionized high-performance computing when they first appeared
two decades ago. As scale and complexity have grown, new challenges in reliability and …

The RowHammer problem and other issues we may face as memory becomes denser

O Mutlu - Design, Automation & Test in Europe Conference & …, 2017 - ieeexplore.ieee.org
As memory scales down to smaller technology nodes, new failure mechanisms emerge that
threaten its correct operation. If such failure mechanisms are not anticipated and corrected …

Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field

J Meza, Q Wu, S Kumar, O Mutlu - 2015 45th Annual IEEE/IFIP …, 2015 - ieeexplore.ieee.org
Computing systems use dynamic random-access memory (DRAM) as main memory. As
prior works have shown, failures in DRAM devices are an important source of errors in …

Lessons learned from the analysis of system failures at petascale: The case of blue waters

C Di Martino, Z Kalbarczyk, RK Iyer… - 2014 44th Annual …, 2014 - ieeexplore.ieee.org
This paper provides an analysis of failures and their impact for Blue Waters, the Cray hybrid
(CPU/GPU) supercomputer at the University of Illinois at Urbana-Champaign. The analysis …

Failures in large scale systems: long-term measurement, analysis, and implications

S Gupta, T Patel, C Engelmann, D Tiwari - Proceedings of the …, 2017 - dl.acm.org
Resilience is one of the key challenges in maintaining high efficiency of future extreme scale
supercomputers. Researchers and system practitioners rely on field-data studies to …

[PDF][PDF] Research problems and opportunities in memory systems

O Mutlu, L Subramanian - Supercomputing frontiers and …, 2014 - superfri.susu.ru
The memory system is a fundamental performance and energy bottleneck in almost all
computing systems. Recent system design, application, and technology trends that require …