FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

J Huang, J Liu, Z Chen, Z Jiang, Y Li, J Gu… - Proceedings of the 46th …, 2024 - dl.acm.org
Postmortem analysis is essential in the management of incidents within cloud systems,
which provides valuable insights to improve system's reliability and robustness. At CloudA1 …

Outage-Watch: Early Prediction of Outages using Extreme Event Regularizer

S Agarwal, S Chakraborty, S Garg, S Bisht… - Proceedings of the 31st …, 2023 - dl.acm.org
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to
retain customers and prevent revenue loss, it is important to provide high reliability …