Tools and benchmarks for automated log parsing

J Zhu, S He, J Liu, P He, Q Xie… - 2019 IEEE/ACM 41st …, 2019 - ieeexplore.ieee.org
Logs are imperative in the development and maintenance process of many software
systems. They record detailed runtime information that allows developers and support …

Towards the use of the readily available tests from the release pipeline as performance tests: Are we there yet?

Z Ding, J Chen, W Shang - Proceedings of the ACM/IEEE 42nd …, 2020 - dl.acm.org
Performance is one of the important aspects of software quality. Performance issues exist
widely in software systems, and the process of fixing the performance issues is an essential …

A comprehensive survey of logging in software: From logging statements automation to log mining and analysis

S Gholamian, PAS Ward - arXiv preprint arXiv:2110.12489, 2021 - arxiv.org
Logs are widely used to record runtime information of software systems, such as the
timestamp and the importance of an event, the unique ID of the source of the log, and a part …

An empirical investigation of incident triage for online service systems

J Chen, X He, Q Lin, Y Xu, H Zhang… - 2019 IEEE/ACM 41st …, 2019 - ieeexplore.ieee.org
Online service systems have become increasingly popular. During operation of an online
service system, incidents (unplanned interruptions or outages of the service) are inevitable …

How incidental are the incidents? characterizing and prioritizing incidents for large-scale online service systems

J Chen, S Zhang, X He, Q Lin, H Zhang, D Hao… - Proceedings of the 35th …, 2020 - dl.acm.org
Although tremendous efforts have been devoted to the quality assurance of online service
systems, in reality, these systems still come across many incidents (ie, unplanned …

Predicting node failures in an ultra-large-scale cloud computing platform: an aiops solution

Y Li, ZM Jiang, H Li, AE Hassan, C He… - ACM Transactions on …, 2020 - dl.acm.org
Many software services today are hosted on cloud computing platforms, such as Amazon
EC2, due to many benefits like reduced operational costs. However, node failures in these …

Logzip: Extracting hidden structures via iterative clustering for log compression

J Liu, J Zhu, S He, P He, Z Zheng… - 2019 34th IEEE/ACM …, 2019 - ieeexplore.ieee.org
System logs record detailed runtime information of software systems and are used as the
main data source for many tasks around software engineering. As modern software systems …

How to mitigate the incident? an effective troubleshooting guide recommendation technique for online service systems

J Jiang, W Lu, J Chen, Q Lin, P Zhao, Y Kang… - Proceedings of the 28th …, 2020 - dl.acm.org
In recent years, more and more traditional shrink-wrapped software is provided as 7x24
online services. Incidents (events that lead to service disruptions or outages) could affect …

An exploratory study of performance regression introducing code changes

J Chen, W Shang - 2017 ieee international conference on …, 2017 - ieeexplore.ieee.org
Performance is an important aspect of software quality. In fact, large software systems
failures are often due to performance issues rather than functional bugs. One of the most …

An empirical study of the impact of data splitting decisions on the performance of AIOps solutions

Y Lyu, H Li, M Sayagh, ZM Jiang… - ACM Transactions on …, 2021 - dl.acm.org
AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help
practitioners handle the massive data produced during the operations of large-scale …