Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

{INFaaS}: Automated model-less inference serving

F Romero, Q Li, NJ Yadwadkar… - 2021 USENIX Annual …, 2021 - usenix.org
Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

S Choi, S Lee, Y Kim, J Park, Y Kwon… - 2022 USENIX Annual …, 2022 - usenix.org
As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …

Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}

L Ma, Z Xie, Z Yang, J Xue, Y Miao, W Cui… - … USENIX Symposium on …, 2020 - usenix.org
Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …

Gslice: controlled spatial sharing of gpus for a scalable inference platform

A Dhakal, SG Kulkarni, KK Ramakrishnan - Proceedings of the 11th ACM …, 2020 - dl.acm.org
The increasing demand for cloud-based inference services requires the use of Graphics
Processing Unit (GPU). It is highly desirable to utilize GPU efficiently by multiplexing different …

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org
Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

Enabling rack-scale confidential computing using heterogeneous trusted execution environment

J Zhu, R Hou, XF Wang, W Wang, J Cao… - … IEEE Symposium on …, 2020 - ieeexplore.ieee.org
With its huge real-world demands, large-scale confidential computing still cannot be
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …

Morphling: Fast, near-optimal auto-configuration for cloud-native model serving

L Wang, L Yang, Y Yu, W Wang, B Li, X Sun… - Proceedings of the …, 2021 - dl.acm.org
Machine learning models are widely deployed in production cloud to provide online
inference services. Efficiently deploying inference services requires careful tuning of …

Dissecting the CUDA scheduling hierarchy: a performance and predictability perspective

IS Olmedo, N Capodieci, JL Martinez… - 2020 IEEE Real …, 2020 - ieeexplore.ieee.org
Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in
safety-related domains has opened up many research problems in the real-time community …