Kubernetes scheduling: Taxonomy, ongoing issues and challenges

C Carrión - ACM Computing Surveys, 2022 - dl.acm.org
Continuous integration enables the development of microservices-based applications using
container virtualization technology. Container orchestration systems such as Kubernetes …

Elasticity in cloud computing: state of the art and research challenges

Y Al-Dhuraibi, F Paraiso, N Djarallah… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Elasticity is a fundamental property in cloud computing that has recently witnessed major
developments. This article reviews both classical and recent elasticity solutions and …

Parallel driving OS: A ubiquitous operating system for autonomous driving in CPSS

L Chen, Y Zhang, B Tian, Y Ai, D Cao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
With the rapid development of autonomous driving technologies, a vast array of autonomous
driving algorithms and platforms have emerged. These algorithms and platforms are usually …

Pond: Cxl-based memory pooling systems for cloud platforms

H Li, DS Berger, L Hsu, D Ernst, P Zardoshti… - Proceedings of the 28th …, 2023 - dl.acm.org
Public cloud providers seek to meet stringent performance requirements and low hardware
cost. A key driver of performance and cost is main memory. Memory pooling promises to …

{INFaaS}: Automated model-less inference serving

F Romero, Q Li, NJ Yadwadkar… - 2021 USENIX Annual …, 2021 - usenix.org
Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

Gandiva: Introspective cluster scheduling for deep learning

W Xiao, R Bhardwaj, R Ramjee, M Sivathanu… - … USENIX Symposium on …, 2018 - usenix.org
We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …

Autopilot: workload autoscaling at google

K Rzadca, P Findeisen, J Swiderski, P Zych… - Proceedings of the …, 2020 - dl.acm.org
In many public and private Cloud systems, users need to specify a limit for the amount of
resources (CPU cores and RAM) to provision for their workloads. A job that exceeds its limits …

Anvil: Verifying liveness of cluster management controllers

X Sun, W Ma, JT Gu, Z Ma, T Chajed, J Howell… - … USENIX Symposium on …, 2024 - usenix.org
Modern clouds depend crucially on an extensible ecosystem of thousands of controllers,
each managing critical systems (eg, a ZooKeeper cluster). A controller continuously …

Protean:{VM} allocation service at scale

O Hadary, L Marshall, I Menache, A Pan… - … USENIX Symposium on …, 2020 - usenix.org
We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …

Ekya: Continuous learning of video analytics models on edge compute servers

R Bhardwaj, Z Xia, G Ananthanarayanan… - … USENIX Symposium on …, 2022 - usenix.org
Video analytics applications use edge compute servers for processing videos. Compressed
models that are deployed on the edge servers for inference suffer from data drift where the …