Aiops solutions for incident management: Technical guidelines and a comprehensive literature review

Y Remil, A Bendimerad, R Mathonat… - arXiv preprint arXiv …, 2024 - arxiv.org
The management of modern IT systems poses unique challenges, necessitating scalability,
reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on …

Pre-training code representation with semantic flow graph for effective bug localization

Y Du, Z Yu - Proceedings of the 31st ACM Joint European Software …, 2023 - dl.acm.org
Enlightened by the big success of pre-training in natural language processing, pre-trained
models for programming languages have been widely used to promote code intelligence in …

AutoKAD: Empowering KPI Anomaly Detection with Label-Free Deployment

Z Yu, C Pei, S Zhang, X Wen, J Li… - 2023 IEEE 34th …, 2023 - ieeexplore.ieee.org
Monitoring Key Performance Indicators (KPIs) and detecting anomalies in online service
systems is critical. However, choosing the right KPI anomaly detection algorithm and …

Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems

Y Zhao, L Jiang, Y Tao, S Zhang, C Wu… - 2023 IEEE 34th …, 2023 - ieeexplore.ieee.org
In online service systems, a majority of incidents are caused by changes, which can
influence user experience and cause huge economic loss. Experiences with a real-world …

How to Manage Change-Induced Incidents? Lessons from the Study of Incident Life Cycle

Y Zhao, L Jiang, Y Tao, S Zhang, C Wu… - 2023 IEEE 34th …, 2023 - ieeexplore.ieee.org
In online service systems, software changes cause a majority of incidents (ie, unplanned
interruptions and outages). Managing change-induced incidents efficiently is crucial for …

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

G Yu, P Chen, Z He, Q Yan, Y Luo, F Li… - Proceedings of the ACM …, 2024 - dl.acm.org
In large-scale online service systems, the occurrence of software changes is inevitable and
frequent. Despite rigorous pre-deployment testing practices, the presence of defective …

Detection Latencies of Anomaly Detectors: An Overlooked Perspective?

T Puccetti, A Ceccarelli - arXiv preprint arXiv:2402.09082, 2024 - arxiv.org
The ever-evolving landscape of attacks, coupled with the growing complexity of ICT systems,
makes crafting anomaly-based intrusion detectors (ID) and error detectors (ED) a difficult …

On the Difficulty of Identifying Incident-Inducing Changes

E Kapel, L Cruz, D Spinellis… - Proceedings of the 46th …, 2024 - dl.acm.org
Effective change management is crucial for businesses heavily reliant on software and
services to minimise incidents induced by changes. Unfortunately, in practice it is often …

Incident Prevention Through Reliable Changes Deployment

E Kapel - 2023 IEEE/ACM 45th International Conference on …, 2023 - ieeexplore.ieee.org
Ensuring the reliability of changes deployment is essential to prevent incidents in
businesses that strongly depend on software and services. Incidents should be avoided …

[PDF][PDF] TH ESE de DOCTORAT de l'INSA Lyon membre de l'Universit e de Lyon Ecole Doctorale N 512 Math ematiques et Informatique (InfoMaths) Sp ecialit e …

Y Remil - 2023 - researchgate.net
Résumé La supervision des systemes informatiques modernes présente de nouveaux défis
en termes de scalabilité, de fiabilité et d'efficacité. Les méthodes traditionnelles de …