Clonos: Consistent High-Availability for Distributed Stream Processing through Causal Logging- 学术资源搜索

[PDF][PDF] Clonos: Consistent High-Availability for Distributed Stream Processing through Causal Logging

PMF Silvestre - 2020 - asc.di.fct.unl.pt

2020•asc.di.fct.unl.pt

Nowadays, distributed stream processing systems lie in the backbone of businesses, as a
backend for critical event-driven applications such as real-time fraud detection or stock
trading. Given their critical nature these systems should be expressive, performant, highly-
available and maintain state consistency after failure. However, current fault-tolerance
solutions forego one of these four requirements. Highly-available systems sacrifice either
consistency or expressiveness and often performance, while more reliable systems have …

Abstract

Nowadays, distributed stream processing systems lie in the backbone of businesses, as a backend for critical event-driven applications such as real-time fraud detection or stock trading. Given their critical nature these systems should be expressive, performant, highly-available and maintain state consistency after failure. However, current fault-tolerance solutions forego one of these four requirements. Highly-available systems sacrifice either consistency or expressiveness and often performance, while more reliable systems have slow and blocking recovery.

In this thesis, we describe Clonos, a highly-available stream processing system that instantly switches the execution of failed operators to passive standbys. Our approach is non-blocking as it uses localized recovery, treating only the failed operators. By additionally forcing recovering operators to replay nondeterministic events Clonos achieves consistent recovery. To manage nondeterminism, we adapt causal logging, a rollback recovery method that logs such events in-memory and propagates them causally, to the stream processing paradigm. Clonos is configurable, allowing one to trade-off overhead for safety. To implement Clonos we re-engineered the distributed runtime of Apache Flink, a state-of-the-art stream processing system. To evaluate the performance of Clonos in terms of throughput, latency, network bandwidth and recovery time we perform overhead and failure experiments using both realistic and synthetic workloads. Clonos delivers upwards of 10 times faster recovery times without blocking and with much lower latency, at the cost of 11% throughput overhead on realistic workloads, when compared to state-of-the-art reliable systems. Clonos is more expressive than past highly-available systems, supporting a much larger set of use-cases. Clonos’ use of causal logging also opens a plethora of new opportunities, such as transaction-less exactly-once delivery guarantees and consistent non-blocking reconfiguration.

asc.di.fct.unl.pt

展开收起

被引用次数：1 相关文章所有 5 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

Google学术搜索按钮

安装不用了

example.edu/paper.pdf

搜索

获取 PDF 文件

引用

References

高级搜索

QQ 群

[PDF][PDF] Clonos: Consistent High-Availability for Distributed Stream Processing through Causal Logging

引用