Kubernetes scheduling: Taxonomy, ongoing issues and challenges

C Carrión - ACM Computing Surveys, 2022 - dl.acm.org
Continuous integration enables the development of microservices-based applications using
container virtualization technology. Container orchestration systems such as Kubernetes …

Computation offloading toward edge computing

L Lin, X Liao, H Jin, P Li - Proceedings of the IEEE, 2019 - ieeexplore.ieee.org
We are living in a world where massive end devices perform computing everywhere and
everyday. However, these devices are constrained by the battery and computational …

Pond: Cxl-based memory pooling systems for cloud platforms

H Li, DS Berger, L Hsu, D Ernst, P Zardoshti… - Proceedings of the 28th …, 2023 - dl.acm.org
Public cloud providers seek to meet stringent performance requirements and low hardware
cost. A key driver of performance and cost is main memory. Memory pooling promises to …

{FIRM}: An intelligent fine-grained resource management framework for {SLO-Oriented} microservices

H Qiu, SS Banerjee, S Jha, ZT Kalbarczyk… - 14th USENIX symposium …, 2020 - usenix.org
User-facing latency-sensitive web services include numerous distributed,
intercommunicating microservices that promise to simplify software development and …

Learning scheduling algorithms for data processing clusters

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019 - dl.acm.org
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems

Y Gan, Y Zhang, D Cheng, A Shetty, P Rathi… - Proceedings of the …, 2019 - dl.acm.org
Cloud services have recently started undergoing a major shift from monolithic applications,
to graphs of hundreds or thousands of loosely-coupled microservices. Microservices …

Borg: the next generation

M Tirmazi, A Barker, N Deng, ME Haque… - Proceedings of the …, 2020 - dl.acm.org
This paper analyzes a newly-published trace that covers 8 different Borg [35] clusters for the
month of May 2019. The trace enables researchers to explore how scheduling works in …

Ray: A distributed framework for emerging {AI} applications

P Moritz, R Nishihara, S Wang, A Tumanov… - … USENIX symposium on …, 2018 - usenix.org
The next generation of AI applications will continuously interact with the environment and
learn from these interactions. These applications impose new and demanding systems …

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

Autopilot: workload autoscaling at google

K Rzadca, P Findeisen, J Swiderski, P Zych… - Proceedings of the …, 2020 - dl.acm.org
In many public and private Cloud systems, users need to specify a limit for the amount of
resources (CPU cores and RAM) to provision for their workloads. A job that exceeds its limits …