Measuring the impact of memory errors on application performance

M Gottscho, M Shoaib, S Govindan… - IEEE Computer …, 2016 - ieeexplore.ieee.org
Memory reliability is a key factor in the design of warehouse-scale computers. Prior work has
focused on the performance overheads of memory fault-tolerance schemes when errors do …

Memory errors in modern systems: The good, the bad, and the ugly

V Sridharan, N DeBardeleben, S Blanchard… - ACM SIGARCH …, 2015 - dl.acm.org
Several recent publications have shown that hardware faults in the memory subsystem are
commonplace. These faults are predicted to become more frequent in future systems that …

[PDF][PDF] A realistic evaluation of memory hardware errors and software system susceptibility

X Li, MC Huang, K Shen, L Chu - 2010 USENIX Annual Technical …, 2010 - usenix.org
Memory hardware reliability is an indispensable part of whole-system dependability. This
paper presents the collection of realistic memory hardware error traces (including transient …

Rampage: Graceful degradation management for memory errors in commodity linux servers

H Schirmeier, J Neuhalfen, I Korb… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org
Memory errors are a major source of reliability problems in current computers. Undetected
errors may result in program termination, or, even worse, silent data corruption. Recent …

Sampler: Pmu-based sampling to detect memory errors latent in production software

S Silvestro, H Liu, T Zhang, C Jung… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
Deployed software is still faced with numerous in-production memory errors. They can
significantly affect system reliability and security, causing application crashes, erratic …

Evaluating operating system vulnerability to memory errors

KB Ferreira, K Pedretti, R Brightwell… - Proceedings of the 2nd …, 2012 - dl.acm.org
Reliability is of great concern to the scalability of extreme-scale systems. Of particular
concern are soft errors in main memory, which are a leading cause of failures on current …

[PDF][PDF] Analysis and modeling of memory errors from large-scale field data collection

T Siddiqua, AE Papathanasiou, A Biswas… - SELSE, 2013 - Citeseer
Main memory reliability plays a crucial role in overall system reliability. Unfortunately, our
collective understanding of the rate, pattern, and impact of memory errors is inadequate and …

HARP: Practically and effectively identifying uncorrectable errors in memory chips that use on-die error-correcting codes

M Patel, GF de Oliveira, O Mutlu - MICRO-54: 54th Annual IEEE/ACM …, 2021 - dl.acm.org
Aggressive storage density scaling in modern main memories causes increasing error rates
that are addressed using error-mitigation techniques. State-of-the-art techniques for …

Reducing minor page fault overheads through enhanced page walker

C Tirumalasetty, CC Chou, N Reddy, P Gratz… - ACM Transactions on …, 2022 - dl.acm.org
Application virtual memory footprints are growing rapidly in all systems from servers down to
smartphones. To address this growing demand, system integrators are incorporating ever …

The efficacy of error mitigation techniques for DRAM retention failures: A comparative experimental study

S Khan, D Lee, Y Kim, AR Alameldeen… - ACM SIGMETRICS …, 2014 - dl.acm.org
As DRAM cells continue to shrink, they become more susceptible to retention failures.
DRAM cells that permanently exhibit short retention times are fairly easy to identify and …