A joint study of the challenges, opportunities, and roadmap of mlops and aiops: A systematic survey

J Diaz-De-Arcaya, AI Torre-Bastida, G Zarate… - ACM Computing …, 2023 - dl.acm.org
Data science projects represent a greater challenge than software engineering for
organizations pursuing their adoption. The diverse stakeholders involved emphasize the …

Adopting artificial intelligence technology for network operations in digital transformation

S Min, B Kim - Administrative Sciences, 2024 - mdpi.com
This study aims to define factors that affect Artificial Intelligence (AI) technology introduction
to network operations and analyze the relative importance of such factors. Based on this …

ESRO: Experience Assisted Service Reliability against Outages

S Chakraborty, S Agarwal, S Garg… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Modern cloud services are prone to failures due to their complex architecture, making
diagnosis a critical process. Site Reliability Engineers (SREs) spend hours leveraging …

Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information

G No, Y Lee, H Kang, P Kang - Engineering Applications of Artificial …, 2024 - Elsevier
As the information technology industry advances, the demand for log anomaly detection,
based solely on printed log text, is growing. However, identifying anomalies in rapidly …

RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models

Z Wang, Z Liu, Y Zhang, A Zhong, L Fan, L Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language model (LLM) applications in cloud root cause analysis (RCA) have been
actively explored recently. However, current methods are still reliant on manual workflow …

Efficient Resource Utilization in IoT and Cloud Computing

VK Prasad, D Dansana, MD Bhavsar, B Acharya… - Information, 2023 - mdpi.com
With the proliferation of IoT devices, there has been exponential growth in data generation,
placing substantial demands on both cloud computing (CC) and internet infrastructure. CC …

Learning to Diagnose: Meta-Learning for Efficient Adaptation in Few-Shot AIOps Scenarios

Y Duan, H Bao, G Bai, Y Wei, K Xue, Z You, Y Zhang… - Electronics, 2024 - mdpi.com
With the advancement of technologies like 5G, cloud computing, and microservices, the
complexity of network management systems and the variety of technical components have …

ADARMA auto-detection and auto-remediation of microservice anomalies by leveraging large language models

S Komal, N Zakeya, R Raphael, A Harit… - Proceedings of the 33rd …, 2023 - dl.acm.org
In microservice architecture, anomalies can cause slow response times or poor user
experience if not detected early. Manual detection can be time-consuming and error-prone …

[HTML][HTML] StreamAD: A cloud platform metrics-oriented benchmark for unsupervised online anomaly detection

J Xu, C Lin, F Liu, Y Wang, W Xiong, Z Li… - BenchCouncil …, 2023 - Elsevier
Cloud platforms, serving as fundamental infrastructure, play a significant role in developing
modern applications. In recent years, there has been growing interest among researchers in …

RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM considering Token-level information

G No, Y Lee, H Kang, P Kang - arXiv preprint arXiv:2311.05160, 2023 - arxiv.org
As the IT industry advances, system log data becomes increasingly crucial. Many computer
systems rely on log texts for management due to restricted access to source code. The need …