Network performance monitoring today is restricted by existing switch support for measurement, forcing operators to rely heavily on endpoints with poor visibility into the …
Network performance anomalies (NPAs), eg long-tailed latency, bandwidth decline, etc., are increasingly crucial to cloud providers as applications are getting more sensitive to …
Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as …
Data center network (DCN) is the backbone of many emerging applications from smart connected homes to smart traffic control and is continuously evolving to meet the diverse …
With the growing market of cloud databases, careful detection and elimination of slow queries are of great importance to service stability. Previous studies focus on optimizing the …
R Miao, L Zhu, S Ma, K Qian, S Zhuang, B Li… - Proceedings of the …, 2022 - dl.acm.org
This paper presents the two generations of storage network stacks that reduced the average I/O latency of Alibaba Cloud's EBS service by 72% in the last five years: Luna, a user-space …
C Tan, Z Jin, C Guo, T Zhang, H Wu, K Deng… - … USENIX Symposium on …, 2019 - usenix.org
The availability of data center services is jeopardized by various network incidents. One of the biggest challenges for network incident handling is to accurately localize the failures …
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur. We introduce 007, a lightweight, always-on …
Y Chen, H Xie, M Ma, Y Kang, X Gao, L Shi… - Proceedings of the …, 2024 - dl.acm.org
Ensuring the reliability and availability of cloud services necessitates efficient root cause analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual …