Serving {DNNs} like clockwork: Performance predictability from the bottom up

A Gujarati, R Karimi, S Alzayat, W Hao… - … USENIX Symposium on …, 2020 - usenix.org
Machine learning inference is becoming a core building block for interactive web
applications. As a result, the underlying model serving systems on which these applications …

ghost: Fast & flexible user-space delegation of linux scheduling

JT Humphries, N Natu, A Chaugule, O Weisse… - Proceedings of the …, 2021 - dl.acm.org
We present ghOSt, our infrastructure for delegating kernel scheduling decisions to
userspace code. ghOSt is designed to support the rapidly evolving needs of our data center …

{Prediction-Based} power oversubscription in cloud platforms

AG Kumbhare, R Azimi, I Manousakis… - 2021 USENIX Annual …, 2021 - usenix.org
Prior work has used power capping to shave rare power peaks and add more servers to a
datacenter, thereby oversubscribing its resources and lowering capital costs. This works well …

Servermore: Opportunistic execution of serverless functions in the cloud

A Suresh, A Gandhi - Proceedings of the ACM symposium on cloud …, 2021 - dl.acm.org
Serverless computing allows customers to submit their jobs to the cloud for execution, with
the resource provisioning being taken care of by the cloud provider. Serverless functions are …

Metrics for sustainability in data centers

A Gandhi, D Lee, Z Liu, S Mu, E Zadok… - ACM SIGENERGY …, 2023 - dl.acm.org
Despite several calls from the community for improving the sustainability of computing,
sufficient progress is yet to be made on one of the key prerequisites of sustainable …

Demeter: Qos-aware cpu scheduling to reduce power consumption of multiple black-box workloads

W Tang, Y Ke, S Fu, H Jiang, J Wu, Q Peng… - Proceedings of the 13th …, 2022 - dl.acm.org
Energy consumption in cloud data centers has become an increasingly important contributor
to greenhouse gas emissions and operation costs. To reduce energy-related costs and …

Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF

M Rezvani, A Jahanshahi… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
This paper explores a novel server observability approach using eBPF (extended Berkeley
Packet Filter) for detailed request-level performance metrics of data center latency-sensitive …

PowerMorph: QoS-aware server power reshaping for data center regulation service

A Jahanshahi, N Yu, D Wong - ACM Transactions on Architecture and …, 2022 - dl.acm.org
Adoption of renewable energy in power grids introduces stability challenges in regulating
the operation frequency of the electricity grid. Thus, electrical grid operators call for …

Slo-power: Slo and power-aware elastic scaling for web services

M Savasci, A Souza, L Wu, D Irwin… - 2024 IEEE 24th …, 2024 - ieeexplore.ieee.org
Managing the performance of online web services in cloud data centers while optimizing
resource allocation and power consumption is a multifaceted challenge. Often, resource and …

Tiresias: Optimizing NUMA Performance with CXL Memory and Locality-Aware Process Scheduling

W Tang, T Ai, J Wu - Proceedings of the ACM Turing Award Celebration …, 2024 - dl.acm.org
The growing demand for memory systems with larger capacities and faster data transfer
speeds has driven progress in the widespread adoption of multi-socket machines and …