Dynamic TCP initial windows and congestion control schemes through reinforcement learning

X Nie, Y Zhao, Z Li, G Chen, K Sui… - IEEE Journal on …, 2019 - ieeexplore.ieee.org
Despite many years of improvements to it, TCP still suffers from an unsatisfactory
performance. For services dominated by short flows (eg, web search and e-commerce), TCP …

Bridging machine learning and computer network research: a survey

Y Cheng, J Geng, Y Wang, J Li, D Li, J Wu - CCF Transactions on …, 2019 - Springer
With the booming development of artificial intelligence (AI), a series of relevant applications
are emerging and promoting an all-rounded reform of the industry. As the major technology …

Improving TCP congestion control with machine intelligence

Y Kong, H Zang, X Ma - Proceedings of the 2018 Workshop on Network …, 2018 - dl.acm.org
In a TCP/IP network, a key to ensure efficient and fair sharing of network resources among
its users is the TCP congestion control (CC) scheme. Previously, the design of TCP CC …

Hotspot: Anomaly localization for additive kpis with multi-dimensional attributes

Y Sun, Y Zhao, Y Su, D Liu, X Nie, Y Meng… - IEEE …, 2018 - ieeexplore.ieee.org
Additive key performance indicators (KPIs)(such as page view (PV), revenue, and error
count) with multi-dimensional attributes (such as ISP, Province, and DataCenter) are …

Generic and robust localization of multi-dimensional root causes

Z Li, C Luo, Y Zhao, Y Sun, K Sui… - 2019 IEEE 30th …, 2019 - ieeexplore.ieee.org
Operators of online software services periodically collect various measures with many
attributes. When a measure becomes abnormal, indicating service problems such as …

Codec: Cost-effective duration prediction system for deadline scheduling in the cloud

H Li, M Ma, Y Liu, S Qin, B Qiao, R Yao… - 2023 IEEE 34th …, 2023 - ieeexplore.ieee.org
Modern cloud platforms allow customers to flexibly allocate or release computing resources.
One crucial scenario is how to drive existing VMs to a specific state by a given deadline in a …

Autoroot: A novel fault localization schema of multi-dimensional root causes

P Jing, Y Han, J Sun, T Lin, Y Hu - 2021 IEEE Wireless …, 2021 - ieeexplore.ieee.org
The key challenge for large scale software system maintenance is to minimize the
troubleshooting time when severe system anomaly (eg, server failure, link congestion …

Multi-task sequence learning for performance prediction and KPI mining in database management system

C Wan, W Li, W Ding, Z Zhang, Q Lu, L Qian, J Xu… - Information …, 2021 - Elsevier
Predicting future performance curve and mining the top-K influential KPIs are two important
tasks for Database Management System (DBMS) operations. In this paper, we propose a …

[PDF][PDF] 基于机器学习的智能运维

裴丹, 张圣林, 裴昶华, 阿里巴, 巴公司 - 中国计算机学会通讯, 2017 - netman.aiops.org
当代社会生产生活的许多方面都依赖于大型复杂的软硬件系统, 包括互联网, 高性能计算, 电信,
金融, 电力网络, 物联网, 医疗网络和设备, 航空航天, 军用设备及网络等. 这些系统的用户都期待 …

Reducing web latency through dynamically setting TCP initial window with reinforcement learning

X Nie, Y Zhao, D Pei, G Chen, K Sui… - 2018 IEEE/ACM 26th …, 2018 - ieeexplore.ieee.org
Latency, which directly affects the user experience and revenue of web services, is far from
ideal in reality, due to the well-known TCP flow startup problem. Specifically, since TCP …