OptimML: Joint Control of Inference Latency and Server Power Consumption for ML Performance Optimization

G Chen, X Wang - ACM Transactions on Autonomous and Adaptive …, 2024 - dl.acm.org
Power capping is an important technique for high-density servers to safely oversubscribe the
power infrastructure in a data center. However, power capping is commonly accomplished …

Performance optimization of machine learning inference under latency and server power constraints

G Chen, X Wang - 2022 IEEE 42nd International Conference on …, 2022 - ieeexplore.ieee.org
Power capping is an important technique for high-density servers to safely oversubscribe the
power infrastructure in a data center. However, power capping is commonly accomplished …

Efficient Processing of Machine Learning Workloads on Data Center Servers

G Chen - 2024 - rave.ohiolink.edu
Data centers often need to run lots of Machine Learning (ML) applications with stringent
Service-Level Objective (SLO) requirements, such as inference latency. With bursty Machine …

Adaptive power management through thermal aware workload balancing in internet data centers

J Yao, H Guan, J Luo, L Rao… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
The past decade witnessed the tremendous growth of online services and applications.
Together with the increase of cloud computing, more and more computation are hosted by …

DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems

J Zhang, G Yu, Z He, L Ai, P Chen - Proceedings of the 52nd …, 2023 - dl.acm.org
Latency-critical (LC) applications are widely deployed in modern datacenters. Effective
power management for LC applications can yield significant cost savings. However, it poses …

A reinforcement learning approach for performance-aware reduction in power consumption of data center compute nodes

A Raj, S Perarnau, A Gokhale - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data
centers will continue to grow. A common approach to reducing this energy demand is to limit …

Pack & cap: adaptive dvfs and thread packing under power caps

R Cochran, C Hankendi, AK Coskun… - Proceedings of the 44th …, 2011 - dl.acm.org
The ability to cap peak power consumption is a desirable feature in modern data centers for
energy budgeting, cost management, and efficient power delivery. Dynamic voltage and …

Network packet processing mode-aware power management for data center servers

KD Kang, G Park, NS Kim, D Kim - IEEE Computer Architecture …, 2019 - ieeexplore.ieee.org
In data center servers, power management (PM) exploiting Dynamic Voltage and Frequency
Scaling (DVFS) for processors can play a crucial role to improve energy efficiency. However …

Sleepscale: Runtime joint speed scaling and sleep states management for power efficient data centers

Y Liu, SC Draper, NS Kim - ACM SIGARCH Computer Architecture News, 2014 - dl.acm.org
Power consumption in data centers has been growing significantly in recent years. To
reduce power, servers are beingequipped with increasingly sophisticated power …

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving

Y Dai, R Pan, A Iyer, K Li, R Netravali - arXiv preprint arXiv:2312.05385, 2023 - arxiv.org
Machine learning (ML) inference platforms are tasked with balancing two competing goals:
ensuring high throughput given many requests, and delivering low-latency responses to …