Message logging: Pessimistic, optimistic, causal, and optimal

EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org

This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …

被引用次数：2571 相关文章所有 54 个版本

[PDF] vu.nl

Replication for web hosting systems

S Sivasubramanian, M Szymaniak, G Pierre… - ACM Computing …, 2004 - dl.acm.org

Replication is a well-known technique to improve the accessibility of Web sites. It generally
offers reduced client latencies and increases a site's availability. However, applying …

被引用次数：210 相关文章所有 16 个版本

[PDF] mit.edu

Tachyon: Reliable, memory speed storage for cluster computing frameworks

H Li, A Ghodsi, M Zaharia, S Shenker… - Proceedings of the ACM …, 2014 - dl.acm.org

Tachyon is a distributed file system enabling reliable data sharing at memory speed across
cluster computing frameworks. While caching today improves read workloads, writes are …

被引用次数：477 相关文章所有 29 个版本

[PDF] dgma.donetsk.ua

[图书][B] Distributed systems

M Van Steen, AS Tanenbaum - 2017 - dgma.donetsk.ua

This is the third edition of “Distributed Systems.” In many ways, it is a huge difference
compared to the previous editions, the most important one perhaps being that we have fully …

被引用次数：291 相关文章所有 18 个版本

[PDF] psu.edu

The design and implementation of checkpoint/restart process fault tolerance for Open MPI

J Hursey, JM Squyres, TI Mattox… - 2007 IEEE …, 2007 - ieeexplore.ieee.org

To be able to fully exploit ever larger computing platforms, modern HPC applications and
system software must be able to tolerate inevitable faults. Historically, MPI implementations …

被引用次数：267 相关文章所有 14 个版本

[PDF] epfl.ch

Database replication using generalized snapshot isolation

S Elnikety, F Pedone… - 24th IEEE Symposium on …, 2005 - ieeexplore.ieee.org

Generalized snapshot isolation extends snapshot isolation as used in Oracle and other
databases in a manner suitable for replicated databases. While (conventional) snapshot …

被引用次数：270 相关文章所有 15 个版本

[PDF] hal.science

Uncoordinated checkpointing without domino effect for send-deterministic MPI applications

A Guermouche, T Ropars, E Brunet… - … Parallel & Distributed …, 2011 - ieeexplore.ieee.org

As reported by many recent studies, the mean time between failures of future post-petascale
supercomputers is likely to reduce, compared to the current situation. The most popular fault …

被引用次数：183 相关文章所有 17 个版本

[PS] usenix.org

[PS][PS] Adaptive and reliable parallel computing on networks of workstations

RD Blumofe, PA Lisiecki - … Annual Technical Conference on UNIX and …, 1997 - usenix.org

In this paper, we present the design of Cilk-NOW, a runtime system that adaptively and
reliably executes functional Cilk programs in parallel on a network of UNIX workstations. Cilk …

被引用次数：205 相关文章所有 13 个版本

[PDF] acm.org

Clonos: Consistent causal recovery for highly-available streaming dataflows

PF Silvestre, M Fragkoulis, D Spinellis… - Proceedings of the …, 2021 - dl.acm.org

Stream processing lies in the backbone of modern businesses, being employed for mission
critical applications such as real-time fraud detection, car-trip fare calculations, traffic …

被引用次数：28 相关文章所有 7 个版本

[PDF] acm.org

Lineage stash: fault tolerance off the critical path

S Wang, J Liagouris, R Nishihara, P Moritz… - Proceedings of the 27th …, 2019 - dl.acm.org

As cluster computing frameworks such as Spark, Dryad, Flink, and Ray are being deployed
in mission critical applications and on larger and larger clusters, their ability to tolerate …

被引用次数：51 相关文章所有 10 个版本

高级搜索

QQ 群