A survey of rollback-recovery protocols in message-passing systems

EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org
This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …

Flashback: A lightweight extension for rollback and deterministic replay for software debugging

SM Srinivasan, S Kandula, CR Andrews… - USENIX annual technical …, 2004 - usenix.org
Software robustness has significant impact on system availability. Unfortunately, finding
software bugs is a very challenging task because many bugs are hard to reproduce. While …

The cost of recovery in message logging protocols

S Rao, L Alvisi, HM Vin - IEEE Transactions on Knowledge and …, 2000 - ieeexplore.ieee.org
Past research in message logging has focused on studying the relative overhead imposed
by pessimistic, optimistic and causal protocols during failure-free executions. In this paper …

A survey and review of the current state of rollback‐recovery for cluster systems

A Maloney, A Goscinski - Concurrency and Computation …, 2009 - Wiley Online Library
A variety of research problems exist that require considerable time and computational
resources to solve. Attempting to solve these problems produces long‐running applications …

[图书][B] Concurrent and distributed computing in Java

VK Garg - 2005 - books.google.com
Concurrent and Distributed Computing in Java addresses fundamental concepts in
concurrent computing with Java examples. The book consists of two parts. The first part …

Egida: An extensible toolkit for low-overhead fault-tolerance

S Rao, L Alvisi, HM Vin - Digest of Papers. Twenty-Ninth Annual …, 1999 - ieeexplore.ieee.org
We discuss the design and implementation of Egida, an object-oriented toolkit designed to
support transparent rollback-recovery. Egida exports a simple specification language that …

CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows

G Siachamis, K Psarakis, M Fragkoulis… - arXiv preprint arXiv …, 2024 - arxiv.org
Stream processing in the last decade has seen broad adoption in both commercial and
research settings. One key element for this success is the ability of modern stream …

An efficient optimistic message logging scheme for recoverable mobile computing systems

T Park, N Woo, HY Yeom - IEEE Transactions on mobile …, 2002 - ieeexplore.ieee.org
A number of checkpointing and message logging algorithms have been proposed to support
fault tolerance of mobile computing systems. However, little attention has been paid to the …

Feedback control-based QoS guarantees in web application servers

W Pan, D Mu, H Wu, L Yao - 2008 10th IEEE international …, 2008 - ieeexplore.ieee.org
This paper considers providing two types of QoS guarantees, proportional delay
differentiation (PDD) and absolute delay guarantee (ADG), in the database connection pool …

Distributed recovery with κ-optimistic logging

OP Damani, VK Garg, YM Wang - US Patent 5,938,775, 1999 - Google Patents
A fault tolerant message passing system includes a plurality of interconnected processors
with storage and a watchdog process wherein the processors may undergo failure. A …