Abstract
Faults are inevitable in a complex online service system. Compared with the textual incident records, the knowledge graph provides an abstract and formal representation for the empirical knowledge of how fluctuations, especially faults, propagate. Recent works utilize causality discovery tools to construct the graph for automatic troubleshooting but neglect its correctness.
In this work, we focus on structure discovery of the fluctuation propagation graph among time series. We conduct an empirical study and find that the existing methods either miss a large proportion of relations or discover almost a complete graph. Thus, we propose a relation recommendation framework named FPG-Miner based on active learning. The experiment shows that operators’ feedback can make a mining method to recommend the correct relations earlier, accelerating the trustworthy application of intelligent algorithms like automatic troubleshooting. Moreover, we propose a novel classification-based approach named CAR to speed up relation discovery. For example, when discovering 20% correct relations, our approach shortens 2.3–42.2% of the verification quota compared with the baseline approaches.
Most work was done when Minghua Ma and Xiaohui Nie were at Tsinghua University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, P., et al.: Localization of operational faults in cloud applications by mining causal dependencies in logs using golden signals. In: Hacid, H., et al. (eds.) ICSOC 2020. LNCS, vol. 12632, pp. 137–149. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-76352-7_17
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Chakraborty, S.: Active learning for multimedia computing: survey, recent trends and applications, pp. 4785–4786. ACM, New York (2020)
Chen, P., Qi, Y., Zheng, P., Hou, D.: CauseInfer: automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems. In: INFOCOM, pp. 1887–1895 (2014)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD, pp. 785–794 (2016)
Cheng, W., Zhang, K., Chen, H., Jiang, G., Chen, Z., Wang, W.: Ranking causal anomalies via temporal and dynamical analysis on vanishing correlations. In: KDD, pp. 805–814 (2016)
Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2003)
Endsley, M.R.: From here to autonomy: lessons learned from human-automation research. Hum. Factors 59(1), 5–27 (2017)
Farshchi, M., Schneider, J.G., Weber, I., Grundy, J.: Experience report: anomaly detection of cloud application operations using log and cloud metric correlation analysis. In: ISSRE, pp. 24–34 (2015)
Guo, R., Cheng, L., Li, J., Hahn, P.R., Liu, H.: A survey of learning causality with data: problems and methods. ACM Comput. Surv. 53(4), 1–37 (2021)
Huawei Technologies Noah’s Ark Lab: Datasets for causal structure learning. https://github.com/huawei-noah/trustworthyAI/tree/master/Causal_Structure_Learning/Datasets. Accessed Feb 2022
Jiang, J., et al.: How to mitigate the incident? An effective troubleshooting guide recommendation technique for online service systems. In: ESEC/FSE, pp. 1410–1420 (2020)
Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
Kipf, T., Fetaya, E., Wang, K.C., Welling, M., Zemel, R.: Neural relational inference for interacting systems. In: ICML, vol. 80, pp. 2688–2697, 10–15 July 2018
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR. pp. 3–12 (1994)
Liu, D., et al.: MicroHECL: high-efficient root cause localization in large-scale microservice systems. In: ICSE-SEIP (2021)
Ma, M., Xu, J., Wang, Y., Chen, P., Zhang, Z., Wang, P.: Automap: diagnose your microservice-based web applications automatically. In: WWW, pp. 246–258 (2020)
Mahimkar, A., et al.: Troubleshooting chronic conditions in large IP networks. In: CONEXT (2008)
Meng, Y., et al.: Localizing failure root causes in a microservice through causality inference. In: IWQoS, pp. 1–10 (2020)
Nauta, M., Bucur, D., Seifert, C.: Causal discovery with attention-based convolutional neural networks. Mach. Learn. Knowl. Extr. 1(1), 312–340 (2019)
Nie, X., Zhao, Y., Sui, K., Pei, D., Chen, Y., Qu, X.: Mining causality graph for automatic web-based service diagnosis. In: IPCCC, pp. 1–8 (2016)
Pearl, J.: Causality: Models, Reasoning, and Inference, 2nd edn. Cambridge University Press (2009)
Roeser, M.B., McDermid, D., Surampudi, S.: Oracle database database reference (2021). https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/index.html
Runge, J.: Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In: UAI, vol. 124, pp. 1388–1397 (August 2020)
Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., Sejdinovic, D.: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5(11), eaau4996 (2019)
Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41(4), 288–297 (1990)
Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers LLC. (2012)
Strobl, E.V., Zhang, K., Visweswaran, S.: Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. J. Causal Infer. 7(1), 1–24 (2019)
Su, Y., et al.: CoFlux: robustly correlating KPIs by fluctuations for service troubleshooting. In: IWQoS (2019)
Wang, H., et al.: Groot: an event-graph-based approach for root cause analysis in industrial settings. In: ASE, pp. 419–429 (2021)
Wang, P., et al.: CloudRanger: root cause identification for cloud native systems. In: CCGRID, pp. 492–502 (2018)
Wu, L., Tordsson, J., Elmroth, E., Kao, O.: MicroRCA: root cause localization of performance issues in microservices. In: NOMS, pp. 1–9 (2020)
Yan, H., Breslau, L., Ge, Z., Massey, D., Pei, D., Yates, J.: G-RCA: a generic root cause analysis platform for service quality management in large IP networks. IEEE/ACM Trans. Netw. 20(6), 1734–1747 (2012)
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Softw. Eng. 48, 1–36 (2020)
Zhang, J.: On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 172(16), 1873–1896 (2008)
Zhang, Y., Ren, L., Chen, L., Xiong, Y., Cheung, S.C., Xie, T.: Detecting numerical bugs in neural network architectures. In: ESEC/FSE (November 2020)
Zhao, N., et al.: Understanding and handling alert storm for online service systems. In: ICSE-SEIP, pp. 162–171 (2020)
Zheng, X., Aragam, B., Ravikumar, P.K., Xing, E.P.: DAGs with no tears: continuous optimization for structure learning. In: NIPS, vol. 31, pp. 9472–9483 (2018)
Acknowledgment
We thank Ruming Tang for proofreading this paper. This work is supported by the National Key R &D Program of China under Grant 2019YFB1802504, and the State Key Program of National Natural Science of China under Grant 62072264.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, M. et al. (2022). Mining Fluctuation Propagation Graph Among Time Series with Active Learning. In: Strauss, C., Cuzzocrea, A., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2022. Lecture Notes in Computer Science, vol 13426. Springer, Cham. https://doi.org/10.1007/978-3-031-12423-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-12423-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12422-8
Online ISBN: 978-3-031-12423-5
eBook Packages: Computer ScienceComputer Science (R0)