A survey of rollback-recovery protocols in message-passing systems

EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org
This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …

The visualization of parallel systems: An overview

E Kraemer, JT Stasko - Journal of Parallel and Distributed Computing, 1993 - Elsevier
We present an overview of visualization tools for parallel systems focusing on parallel
debuggers, performance evaluation systems, and program visualization systems. Our goal …

RAP: A real-time communication architecture for large-scale wireless sensor networks

C Lu, BM Blum, TF Abdelzaher… - … . Eighth IEEE Real …, 2002 - ieeexplore.ieee.org
Large-scale wireless sensor networks represent a new generation of real-time embedded
systems with significantly different communication constraints from traditional networked …

Detecting causal relationships in distributed computations: In search of the holy grail

R Schwarz, F Mattern - Distributed computing, 1994 - Springer
The paper shows that characterizing the causal relationship between significant events is an
important but non-trivial aspect for understanding the behavior of distributed programs. An …

System and method for monitoring and analyzing the execution of computer programs

S Wygodny, D Barboy, G Prouss, A Vorobey - US Patent 6,282,701, 2001 - Google Patents
A software system is disclosed which facilitates the process of tracing the execution paths of
a program, called the client. The tracing is performed without requiring modifications to the …

[PDF][PDF] Replay debugging for distributed applications

DM Geels, G Altekar, S Shenker, I Stoica - 2006 - usenix.org
We have developed a new replay debugging tool, liblog, for distributed C/C++ applications.
It logs the execution of deployed application processes and replays them deterministically …

Jockey: a user-space library for record-replay debugging

Y Saito - Proceedings of the sixth international symposium on …, 2005 - dl.acm.org
Jockey is an execution record/replay tool for debugging Linux programs. It records
invocations of system calls and CPU instructions with timing-dependent effects and later …

System and method for conditional tracing of computer programs

S Wygodny, V Golender, I Ben-Moshe… - US Patent …, 2006 - Google Patents
A software system is disclosed which facilitates the process of tracing the execution paths of
a program, called the client. The tracing is performed without requiring modifications to the …

[PDF][PDF] Optimal tracing and replay for debugging shared-memory parallel programs

RHB Netzer - Proceedings of the 1993 ACM/ONR workshop on …, 1993 - dl.acm.org
Execution replay is a crucial part of debugging. Because explicitly parallel shared-memory
programs can be nondeterministic, a tool is required that traces executions so they can be …

Bigdebug: Debugging primitives for interactive big data processing in spark

MA Gulzar, M Interlandi, S Yoo, SD Tetali… - Proceedings of the 38th …, 2016 - dl.acm.org
Developers use cloud computing platforms to process a large quantity of data in parallel
when developing big data analytics. Debugging the massive parallel computations that run …