MODE: automated neural network model debugging via state differential analysis and input selection

S Ma, Y Liu, WC Lee, X Zhang, A Grama - … of the 2018 26th ACM Joint …, 2018 - dl.acm.org
Artificial intelligence models are becoming an integral part of modern computing systems.
Just like software inevitably has bugs, models have bugs too, leading to poor classification …

Correlating events with time series for incident diagnosis

C Luo, JG Lou, Q Lin, Q Fu, R Ding, D Zhang… - Proceedings of the 20th …, 2014 - dl.acm.org
As online services have more and more popular, incident diagnosis has emerged as a
critical task in minimizing the service downtime and ensuring high quality of the services …

Gandalf: An intelligent,{End-To-End} analytics service for safe deployment in {Large-Scale} cloud infrastructure

Z Li, Q Cheng, K Hsieh, Y Dang, P Huang… - … USENIX Symposium on …, 2020 - usenix.org
Modern cloud systems have a vast number of components that continuously undergo
updates. Deploying these frequent updates quickly without breaking the system is …

Software analytics in practice

D Zhang, S Han, Y Dang, JG Lou, H Zhang… - IEEE …, 2013 - ieeexplore.ieee.org
With software analytics, software practitioners explore and analyze data to obtain insightful,
actionable information for tasks regarding software development, systems, and users. The …

Context-sensitive delta inference for identifying workload-dependent performance bottlenecks

X Xiao, S Han, D Zhang, T Xie - … of the 2013 International Symposium on …, 2013 - dl.acm.org
Software hangs can be caused by expensive operations in responsive actions (such as time-
consuming operations in UI threads). Some of the expensive operations depend on the input …

Software analytics for incident management of online services: An experience report

JG Lou, Q Lin, R Ding, Q Fu, D Zhang… - 2013 28th IEEE/ACM …, 2013 - ieeexplore.ieee.org
As online services become more and more popular, incident management has become a
critical task that aims to minimize the service downtime and to ensure high quality of the …

Learning a hierarchical monitoring system for detecting and diagnosing service issues

V Nair, A Raul, S Khanduja, V Bahirwani… - Proceedings of the 21th …, 2015 - dl.acm.org
We propose a machine learning based framework for building a hierarchical monitoring
system to detect and diagnose service issues. We demonstrate its use for building a …

Carstream: an industrial system of big data processing for internet-of-vehicles

M Zhang, T Wo, T Xie, X Lin, Y Liu - Proceedings of the VLDB …, 2017 - dl.acm.org
As the Internet-of-Vehicles (IoV) technology becomes an increasingly important trend for
future transportation, designing large-scale IoV systems has become a critical task that aims …

Hot Fixing Software: A Comprehensive Review of Terminology, Techniques, and Applications

C Hanna, D Clark, F Sarro, J Petke - arXiv preprint arXiv:2401.09275, 2024 - arxiv.org
A hot fix is an improvement to a specific time-critical issue deployed to a software system in
production. While hot fixing is an essential and common activity in software maintenance, it …

Identifying recurrent and unknown performance issues

MH Lim, JG Lou, H Zhang, Q Fu… - … Conference on Data …, 2014 - ieeexplore.ieee.org
For a large-scale software system, especially an online service system, when a performance
issue occurs, it is desirable to check whether this issue has occurred before. If there are past …