Identifying bad software changes via multimodal anomaly detection for online service systems

N Zhao, J Chen, Z Yu, H Wang, J Li, B Qiu… - Proceedings of the 29th …, 2021 - dl.acm.org
In large-scale online service systems, software changes are inevitable and frequent. Due to
importing new code or configurations, changes are likely to incur incidents and destroy user …

Heterogeneous anomaly detection for software systems via semi-supervised cross-modal attention

C Lee, T Yang, Z Chen, Y Su, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Prompt and accurate detection of system anomalies is essential to ensure the reliability of
software systems. Unlike manual efforts that exploit all available run-time information …

Identifying root-cause metrics for incident diagnosis in online service systems

C Wu, N Zhao, L Wang, X Yang, S Li… - 2021 IEEE 32nd …, 2021 - ieeexplore.ieee.org
Incidents in online service systems could incur poor user experience and tremendous
economic loss. To reduce the influence of incidents and guarantee service reliability, it is …

iFeedback: Exploiting user feedback for real-time issue detection in large-scale online service systems

W Zheng, H Lu, Y Zhou, J Liang… - 2019 34th IEEE/ACM …, 2019 - ieeexplore.ieee.org
Large-scale online systems are complex, fast-evolving, and hardly bug-free despite the
testing efforts. Backend system monitoring cannot detect many types of issues, such as UI …

Identifying linked incidents in large-scale online service systems

Y Chen, X Yang, H Dong, X He, H Zhang… - Proceedings of the 28th …, 2020 - dl.acm.org
In large-scale online service systems, incidents occur frequently due to a variety of causes,
from updates of software and hardware to changes in operation environment. These …

Actionable and interpretable fault localization for recurring failures in online service systems

Z Li, N Zhao, M Li, X Lu, L Wang, D Chang… - Proceedings of the 30th …, 2022 - dl.acm.org
Fault localization is challenging in an online service system due to its monitoring data's large
volume and variety and complex dependencies across/within its components (eg, services …

Adaptive performance anomaly detection for online service systems via pattern sketching

Z Chen, J Liu, Y Su, H Zhang, X Ling, Y Yang… - Proceedings of the 44th …, 2022 - dl.acm.org
To ensure the performance of online service systems, their status is closely monitored with
various software and system metrics. Performance anomalies represent the performance …

Rapid and robust impact assessment of software changes in large internet-based services

S Zhang, Y Liu, D Pei, Y Chen, X Qu, S Tao… - Proceedings of the 11th …, 2015 - dl.acm.org
The detection of performance changes in software change roll-outs in Internet-based
services is crucial for an operations team, because it allows timely roll-back of a software …

MADneSs: A multi-layer anomaly detection framework for complex dynamic systems

T Zoppi, A Ceccarelli… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Anomaly detection can infer the presence of errors without observing the target services, but
detecting variations in the observable parts of the system on which the services reside. This …

How to mitigate the incident? an effective troubleshooting guide recommendation technique for online service systems

J Jiang, W Lu, J Chen, Q Lin, P Zhao, Y Kang… - Proceedings of the 28th …, 2020 - dl.acm.org
In recent years, more and more traditional shrink-wrapped software is provided as 7x24
online services. Incidents (events that lead to service disruptions or outages) could affect …