[PDF][PDF] Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah, Í Goiri… - Power, 2023 - homes.cs.washington.edu
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

G Wilkins, S Keshav, R Mortier - Proceedings of the 15th ACM …, 2024 - dl.acm.org
Both the training and use of Large Language Models (LLMs) require large amounts of
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …

An agile pathway towards carbon-aware clouds

P Patel, T Gregersen, T Anderson - … of the 2nd Workshop on Sustainable …, 2023 - dl.acm.org
Climate change is a pressing threat to planetary well-being that can be addressed only by
rapid near-term actions across all sectors. Yet, the cloud computing sector, with its …

Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads

R Mortier, S Keshav, G Wilkins - 2024 - repository.cam.ac.uk
Both the training and use of Large Language Models (LLMs) require large amounts of
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …