A survey of fault-tolerance and fault-recovery techniques in parallel systems

M Treaster - arXiv preprint cs/0501002, 2005 - arxiv.org
Supercomputing systems today often come in the form of large numbers of commodity
systems linked together into a computing cluster. These systems, like any distributed system …

[PDF][PDF] A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

M Treaster - arXiv preprint cs.DC/0501002, 2005 - Citeseer
Supercomputing systems today often come in the form of large numbers of commodity
systems linked together into a computing cluster. These systems, like any distributed system …

A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

M Treaster - arXiv e-prints, 2004 - ui.adsabs.harvard.edu
Supercomputing systems today often come in the form of large numbers of commodity
systems linked together into a computing cluster. These systems, like any distributed system …

[PDF][PDF] A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

M Treaster - arXiv preprint cs/0501002 - Citeseer
Supercomputing systems today often come in the form of large numbers of commodity
systems linked together into a computing cluster. These systems, like any distributed system …