Y Cai, S Ghose, EF Haratsch, Y Luo… - Proceedings of the …, 2017 - ieeexplore.ieee.org
NAND flash memory is ubiquitous in everyday life today because its capacity has continuously increased and cost has continuously decreased over decades. This positive …
S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has madereliability'a first-order design …
Several recent publications have shown that hardware faults in the memory subsystem are commonplace. These faults are predicted to become more frequent in future systems that …
A Geist, DA Reed - The International Journal of High …, 2017 - journals.sagepub.com
Commodity clusters revolutionized high-performance computing when they first appeared two decades ago. As scale and complexity have grown, new challenges in reliability and …
O Mutlu - Design, Automation & Test in Europe Conference & …, 2017 - ieeexplore.ieee.org
As memory scales down to smaller technology nodes, new failure mechanisms emerge that threaten its correct operation. If such failure mechanisms are not anticipated and corrected …
J Meza, Q Wu, S Kumar, O Mutlu - 2015 45th Annual IEEE/IFIP …, 2015 - ieeexplore.ieee.org
Computing systems use dynamic random-access memory (DRAM) as main memory. As prior works have shown, failures in DRAM devices are an important source of errors in …
This paper provides an analysis of failures and their impact for Blue Waters, the Cray hybrid (CPU/GPU) supercomputer at the University of Illinois at Urbana-Champaign. The analysis …
Resilience is one of the key challenges in maintaining high efficiency of future extreme scale supercomputers. Researchers and system practitioners rely on field-data studies to …
O Mutlu, L Subramanian - Supercomputing frontiers and …, 2014 - superfri.susu.ru
The memory system is a fundamental performance and energy bottleneck in almost all computing systems. Recent system design, application, and technology trends that require …