Quantifying the impact of data replication on error propagation

Z Ozturk, HR Topcuoglu, MT Kandemir - Cluster Computing, 2023 - Springer
Various technological developments in the microprocessor world make modern computing
systems more vulnerable to soft errors than in the past, and consequently fault tolerance …

Where have all the cycles gone?–investigating runtime overheads of os-assisted replication

B Döbel, H Härtig - 2013 - dl.gi.de
In order to allow user-level applications tolerate transient hardware faults, we developed
Romain, an operating system service that transparently replicates unmodified binary …

[PDF][PDF] Reducing resource consumption of replication using dynamic replicas

R Muschner, DIB Döbel - 2013 - Citeseer
Operating systems (OS) manage computer hardware resources and provide services to the
user's applications. For reliable execution the OS depends on error free hardware. The …

Efficient fault tolerance in multi-media applications through selective instruction replication

A Sundaram, A Aakel, D Lockhart, D Thaker… - Proceedings of the …, 2008 - dl.acm.org
As voltages decrease, soft errors are expected to become an increasing problem in
maintaining program correctness. Unfortunately, previous mechanisms to improve processor …

Active replication of multithreaded applications

C Basile, Z Kalbarczyk, RK Iyer - IEEE transactions on parallel …, 2006 - ieeexplore.ieee.org
Software-based active replication is expensive in terms of performance overhead.
Multithreading can help improve performance; however, thread scheduling is a source of …

Selective replication: A lightweight technique for soft errors

FJ Vera Rivera, J Abella Ferrer… - ACM Transactions …, 2009 - upcommons.upc.edu
Soft errors are an important challenge in contemporary microprocessors. Modern processors
have caches and large memory arrays protected by parity or error detection and correction …

A user‐assisted thread‐level vulnerability assessment tool

I Oz, HR Topcuoglu, O Tosun - Concurrency and Computation …, 2019 - Wiley Online Library
The system reliability becomes a critical concern in modern architectures with the scale
down of circuits. To deal with soft errors, the replication of system resources has been used …

[PDF][PDF] Bounding error detection latencies for replicated execution

M Kriegel - Bachelor thesis, TU Dresden, 2013 - os.inf.tu-dresden.de
Ever since transistor-based computers were invented in the 1960s, it is common that these
systems suffer from hardware errors. One particular error is a transient hardware error which …

Selective replication: A lightweight technique for soft errors

X Vera, J Abella, J Carretero, A González - ACM Transactions on …, 2010 - dl.acm.org
Soft errors are an important challenge in contemporary microprocessors. Modern processors
have caches and large memory arrays protected by parity or error detection and correction …

Coping with silent and fail-stop errors at scale by combining replication and checkpointing

A Benoit, A Cavelan, F Cappello, P Raghavan… - Journal of Parallel and …, 2018 - Elsevier
This paper provides a model and an analytical study of replication as a technique to cope
with silent errors, as well as a mixture of both silent and fail-stop errors on large-scale …