Increasing relevance of memory hardware errors: a case for recoverable programming models

D Milojicic, A Messer, J Shau, G Fu… - Proceedings of the 9th …, 2000 - dl.acm.org
It is a common belief that most of computer system failures nowadays stem from
programming errors. Computer systems are becoming more complex and harder to maintain …

Evaluating operating system vulnerability to memory errors

KB Ferreira, K Pedretti, R Brightwell… - Proceedings of the 2nd …, 2012 - dl.acm.org
Reliability is of great concern to the scalability of extreme-scale systems. Of particular
concern are soft errors in main memory, which are a leading cause of failures on current …

Susceptibility of commodity systems and software to memory soft errors

A Messer, P Bernadat, G Fu, D Chen… - IEEE transactions on …, 2004 - ieeexplore.ieee.org
It is widely understood that most system downtime is accounted for by programming errors
and administration time. However, a growing body of work has indicated an increasing …

Exterminator: automatically correcting memory errors with high probability

G Novark, ED Berger, BG Zorn - Proceedings of the 28th ACM SIGPLAN …, 2007 - dl.acm.org
Programs written in C and C++ are susceptible to memory errors, including buffer overflows
and dangling pointers. These errors, whichcan lead to crashes, erroneous execution, and …

[引用][C] Introduction to the special issue on soft errors and data integrity in terrestrial computer systems

J Maiz, N Seifert - IEEE Transactions on Device and Materials …, 2005 - ieeexplore.ieee.org
MORE than a quarter century ago, May and Woods of Intel reported on alpha-particle-
induced soft errors (SEs) in their 2107-series 16-kb DRAMs. This paper represents the first …

Resilience in numerical methods: a position on fault models and methodologies

J Elliott, M Hoemmen, F Mueller - arXiv preprint arXiv:1401.3013, 2014 - arxiv.org
Future extreme-scale computer systems may expose silent data corruption (SDC) to
applications, in order to save energy or increase performance. However, resilience research …

[PDF][PDF] A realistic evaluation of memory hardware errors and software system susceptibility

X Li, MC Huang, K Shen, L Chu - 2010 USENIX Annual Technical …, 2010 - usenix.org
Memory hardware reliability is an indispensable part of whole-system dependability. This
paper presents the collection of realistic memory hardware error traces (including transient …

[PDF][PDF] Susceptibility of modern systems and software to soft errors

A Messer, P Bernadat, G Fu, D Chen… - In HP Labs Technical …, 2001 - academia.edu
It is widely understood that most downtime is accounted for by programming errors and
administration time. However, recent work has indicated an increasing cause of downtime …

Mars Attacks! Software Protection Against Space Radiation

H Wang, S Myint, V Verma, Y Winetraub… - Proceedings of the …, 2023 - dl.acm.org
Due to their low cost and the need to run computationally-intensive algorithms locally,
satellites and spacecraft are increasingly employing off-the-shelf computing hardware …

On the verification of memory management mechanisms

I Dalinger, M Hillebrand, W Paul - … and Verification Methods: 13th IFIP WG …, 2005 - Springer
We report on the design and formal verification of a complex processor supporting address
translation by means of a memory management unit (MMU). We give a paper and pencil …