Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms

E Cortez, A Bonde, A Muzio, M Russinovich… - Proceedings of the 26th …, 2017 - dl.acm.org
Proceedings of the 26th Symposium on Operating Systems Principles, 2017dl.acm.org
Cloud research to date has lacked data on the characteristics of the production virtual
machine (VM) workloads of large cloud providers. A thorough understanding of these
characteristics can inform the providers' resource management systems, eg VM scheduler,
power manager, server health manager. In this paper, we first introduce an extensive
characterization of Microsoft Azure's VM workload, including distributions of the VMs'
lifetime, deployment size, and resource consumption. We then show that certain VM …
Cloud research to date has lacked data on the characteristics of the production virtual machine (VM) workloads of large cloud providers. A thorough understanding of these characteristics can inform the providers' resource management systems, e.g. VM scheduler, power manager, server health manager. In this paper, we first introduce an extensive characterization of Microsoft Azure's VM workload, including distributions of the VMs' lifetime, deployment size, and resource consumption. We then show that certain VM behaviors are fairly consistent over multiple lifetimes, i.e. history is an accurate predictor of future behavior. Based on this observation, we next introduce Resource Central (RC), a system that collects VM telemetry, learns these behaviors offline, and provides predictions online to various resource managers via a general client-side library. As an example of RC's online use, we modify Azure's VM scheduler to leverage predictions in oversubscribing servers (with oversubscribable VM types), while retaining high VM performance. Using real VM traces, we then show that the prediction-informed schedules increase utilization and prevent physical resource exhaustion. We conclude that providers can exploit their workloads' characteristics and machine learning to improve resource management substantially.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果