Y Chen, X Yang, Q Lin, H Zhang, F Gao, Z Xu… - The world wide web …, 2019 - dl.acm.org
With the rapid growth of cloud service systems and their increasing complexity, service failures become unavoidable. Outages, which are critical service failures, could dramatically …
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to retain customers and prevent revenue loss, it is important to provide high reliability …
In cloud-native applications, a large fraction of operational failures, known as outages, result in violations of Service Level Objectives (SLOs). SLOs are defined around specific …
T Yang, J Shen, Y Su, X Ling, Y Yang… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
Service reliability is one of the key challenges that cloud providers have to deal with. In cloud systems, unplanned service failures may cause severe cascading impacts on their …
P Kayongo, J Hoffswell, S Saini, S Garg… - 2022 Working …, 2022 - ieeexplore.ieee.org
Efficient outage detection and remediation is crucial for effectively operating cloud computing systems. To remediate outages, system engineers must quickly identify the …
J Shi, S Jiang, B Xu, Y Xiao - 2023 IEEE 34th International …, 2023 - ieeexplore.ieee.org
The development of the information technology industry has made servers an essential infrastructure for enterprises. Server failure may result in significant economic losses …
A Saha, SCH Hoi - Proceedings of the 44th International Conference on …, 2022 - dl.acm.org
Root Cause Analysis (RCA) of any service-disrupting incident is one of the most critical as well as complex tasks in IT processes, especially for cloud industry leaders like Salesforce …
P Jin, S Zhang, M Ma, H Li, Y Kang, L Li, Y Liu… - Proceedings of the 31st …, 2023 - dl.acm.org
Cloud systems have become increasingly popular in recent years due to their flexibility and scalability. Each time cloud computing applications and services hosted on the cloud are …
J Chen, J Chakraborty, P Clark, K Haverlock… - Proceedings of the …, 2019 - dl.acm.org
Maintaining web-services is a mission-critical task where any down-time means loss of revenue and reputation (of being a reliable service provider). In the current competitive web …