Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder …
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed 1247 headline news and public post-mortem reports that detail 597 unplanned outages that …
We conduct a comprehensive study of development and deployment issues of six popular and important cloud systems (Hadoop MapReduce, HDFS, HBase, Cassandra, ZooKeeper …
Self-adaptation is a first class concern for cloud applications, which should be able to withstand diverse runtime changes. Variations are simultaneously happening both at the …
In distributed systems shared by multiple tenants, effective resource management is an important pre-requisite to providing quality of service guarantees. Many systems deployed …
Recent advances in formal verification techniques enabled the implementation of distributed systems with machine-checked proofs. While results are encouraging, the importance of …
M Lesani, CJ Bell, A Chlipala - ACM SIGPLAN Notices, 2016 - dl.acm.org
Today's Internet services are often expected to stay available and render high responsiveness even in the face of site crashes and network partitions. Theoretical results …
J Rahman, P Lama - 2019 IEEE International Conference on …, 2019 - ieeexplore.ieee.org
Large-scale web services are increasingly adopting cloud-native principles of application design to better utilize the advantages of cloud computing. This involves building an …