Adoption protocols for fanout-optimal fault-tolerant termination detection

D Cunningham, D Grove, B Herta, A Iyengar… - Proceedings of the 19th …, 2014 - dl.acm.org

Scale-out programs run on multiple processes in a cluster. In scale-out systems, processes
can fail. Computations using traditional libraries such as MPI fail when any component …

被引用次数：92 相关文章所有 13 个版本

[PDF] semanticscholar.org

Scalable replay with partial-order dependencies for message-logging fault tolerance

J Lifflander, E Meneses, H Menon… - 2014 IEEE …, 2014 - ieeexplore.ieee.org

Deterministic replay of a parallel application is commonly used for discovering bugs or to
recover from a hard fault with message-logging fault tolerance. For message passing …

被引用次数：25 相关文章所有 16 个版本

[PDF] vu.nl

Fault-tolerant termination detection with Safra's algorithm

G Karlos, W Fokkink, P Fuchs - International Conference on Networked …, 2021 - Springer

Safra's distributed termination detection algorithm employs a logical token ring structure
within a distributed network; only passive nodes forward the token, and a counter in the …

被引用次数：2 相关文章所有 7 个版本

[PDF] keio.ac.jp

Car driving behaviour observation using an immersive car driving simulator

Y Tateyama, Y Mori, K Yamamoto, T Ogi… - … Conference on P2P …, 2010 - ieeexplore.ieee.org

Using a car driving simulator, we can observe drivers' behaviors in dangerous situations
safely. We constructed an immersive car driving simulator. We conducted an experiment in a …

被引用次数：7 相关文章所有 9 个版本

[PDF] hal.science

Resilient optimistic termination detection for the async-finish model

SS Hamouda, J Milthorpe - … , ISC High Performance 2019, Frankfurt/Main …, 2019 - Springer

Driven by increasing core count and decreasing mean-time-to-failure in supercomputers,
HPC runtime systems must improve support for dynamic task-parallel execution and …

被引用次数：4 相关文章所有 13 个版本

Towards resilient Chapel: Design and implementation of a transparent resilience mechanism for Chapel

K Panagiotopoulou, HW Loidl - … of the 3rd International Conference on …, 2015 - dl.acm.org

The exponential increase of components in modern High Performance Computing (HPC)
systems poses a challenge on their resilience: predictions of time between failures on …

被引用次数：6 相关文章

[PDF] osti.gov

Extreme-scale viability of collective communication for resilient task scheduling and work stealing

J Wilke, J Bennett, H Kolla, K Teranishi… - 2014 44th Annual …, 2014 - ieeexplore.ieee.org

Extreme-scale computing will bring significant changes to high performance computing
system architectures. In particular, the increased number of system components is creating a …

被引用次数：6 相关文章所有 6 个版本

[PDF] osti.gov

Coordination languages and MPI perturbation theory: The FOX tuple space framework for resilience

JJ Wilke - 2014 IEEE International Parallel & Distributed …, 2014 - ieeexplore.ieee.org

Coordination languages are an established programming model for distributed computing,
but have been largely eclipsed by message passing (MPI) in scientific computing. In contrast …

被引用次数：5 相关文章所有 6 个版本

[PDF] vu.nl

[PDF][PDF] A fault-tolerant variant of the mahapatra-dutt termination detection algorithm

KB Ardal - 2017 - cs.vu.nl

In distributed systems it is important to know when a distributed algorithm has finished its
computation. No single process can decide when an algorithm has terminated with only its …

被引用次数：2 相关文章

[PDF] vu.nl

[PDF][PDF] Improving Tseng's Fault-Tolerant Termination Detection Algorithm

L Taglialatela - 2021 - cs.vu.nl

Distributed systems are networks of computers that communicate among each other through
message-passing. Such systems can be particularly useful for computing high-workloads …

高级搜索

QQ 群