Workflow-aware automatic fault diagnosis for microservice-based applications with statistics

T Wang, W Zhang, J Xu, Z Gu - IEEE Transactions on Network …, 2020 - ieeexplore.ieee.org
T Wang, W Zhang, J Xu, Z Gu
IEEE Transactions on Network and Service Management, 2020ieeexplore.ieee.org
Microservice architectures bring many benefits, eg, faster delivery, improved scalability, and
greater autonomy, so they are widely adopted to develop and operate Internet-based
applications. How to effectively diagnose the faults of applications with lots of dynamic
microservices has become a key to guarantee applications' performance and reliability. As a
microservice performs various behaviors in different workflows of processing requests,
existing approaches often cannot accurately locate the root cause of an application with …
Microservice architectures bring many benefits, e.g., faster delivery, improved scalability, and greater autonomy, so they are widely adopted to develop and operate Internet-based applications. How to effectively diagnose the faults of applications with lots of dynamic microservices has become a key to guarantee applications’ performance and reliability. As a microservice performs various behaviors in different workflows of processing requests, existing approaches often cannot accurately locate the root cause of an application with interactive microservices in a dynamic deployment environment. We propose a workflow-aware automatic fault diagnosis approach for microservice-based applications with statistics. We characterize traces across microservices with calling trees, and then learn trace patterns as baselines. For the faults affecting the workflows of processing requests, we estimate the workflows’ anomaly degrees, and then locate the microservices causing anomalies by comparing the difference between current traces and learned baselines with tree edit distance. For performance anomalies causing significantly increased response time, we employ principal component analysis to extract suspicious microservices with large fluctuation in response time. Finally, we evaluate our approach on three typical microservice-based applications with a series of experiments. The results show that our approach can accurately locate the microservices causing anomalies.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果