Confidence guided anomaly detection model for anti-concept drift in dynamic logs

X Xie, Z Jin, J Wang, L Yang, Y Lu, T Li - Journal of network and computer …, 2020 - Elsevier
X Xie, Z Jin, J Wang, L Yang, Y Lu, T Li
Journal of network and computer applications, 2020Elsevier
Log data records system state and runtime behaviors, and is usually used to diagnose
system failures and detect anomalies. However, the accuracy of log-based anomaly
detection algorithms will reduce dramatically in dynamic logs since the system more
complex than ever before, a phenomenon known as concept drift. In this paper, we design a
confidence-guide anomaly detection model that combines multiple algorithms, called Multi-
CAD. We first propose a statistical value p_value to measure the non-conformity between …
Abstract
Log data records system state and runtime behaviors, and is usually used to diagnose system failures and detect anomalies. However, the accuracy of log-based anomaly detection algorithms will reduce dramatically in dynamic logs since the system more complex than ever before, a phenomenon known as concept drift. In this paper, we design a confidence-guide anomaly detection model that combines multiple algorithms, called Multi-CAD. We first propose a statistical value p_value to measure the non-conformity between logs and establish a link in the new log and previous logs, and can also choose multiple suitable algorithms as the non-conformity measure to calculate scores for combined detection instead of to make a decision. And then, we design a confidence-guided parameter adjustment method to anti-concept drift in dynamic logs and update the score set with the corresponding label from a trusted result that contains a label, non-conformity score, and confidence by a feedback mechanism as the previous experience for the following-up detection. Finally, we demonstrate that Multi-CAD will make a balance performance in precision rate, recall rate, and F_measure, and detect actual anomalies on multiple datasets. An extensive set of experiment results highlight that Multi-CAD will increase almost 20% on average in recall rate and F_measure compared with four typical algorithms on the HDFS benchmark dataset, where it achieves 98.2% in precision rate, 95.2% in recall rate, and 96.7% in F_measure.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果