Dynamic space-time scheduling for gpu inference

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

被引用次数：35 相关文章所有 3 个版本

[PDF] usenix.org

{INFaaS}: Automated model-less inference serving

F Romero, Q Li, NJ Yadwadkar… - 2021 USENIX Annual …, 2021 - usenix.org

Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

被引用次数：230 相关文章所有 4 个版本

[PDF] usenix.org

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

S Choi, S Lee, Y Kim, J Park, Y Kwon… - 2022 USENIX Annual …, 2022 - usenix.org

As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …

被引用次数：99 相关文章所有 5 个版本

[PDF] usenix.org

Rammer: Enabling holistic deep learning compiler optimizations with {rTasks}

L Ma, Z Xie, Z Yang, J Xue, Y Miao, W Cui… - … USENIX Symposium on …, 2020 - usenix.org

Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is
challenging. Existing DNN frameworks and compilers often treat the DNN operators in a …

被引用次数：161 相关文章所有 7 个版本

[PDF] acm.org

Gslice: controlled spatial sharing of gpus for a scalable inference platform

A Dhakal, SG Kulkarni, KK Ramakrishnan - Proceedings of the 11th ACM …, 2020 - dl.acm.org

The increasing demand for cloud-based inference services requires the use of Graphics
Processing Unit (GPU). It is highly desirable to utilize GPU efficiently by multiplexing different …

被引用次数：121 相关文章所有 3 个版本

[PDF] usenix.org

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org

Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

被引用次数：96 相关文章所有 7 个版本

[PDF] github.io

Enabling rack-scale confidential computing using heterogeneous trusted execution environment

J Zhu, R Hou, XF Wang, W Wang, J Cao… - … IEEE Symposium on …, 2020 - ieeexplore.ieee.org

With its huge real-world demands, large-scale confidential computing still cannot be
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …

被引用次数：95 相关文章所有 5 个版本

[PDF] hkust.edu.hk

Morphling: Fast, near-optimal auto-configuration for cloud-native model serving

L Wang, L Yang, Y Yu, W Wang, B Li, X Sun… - Proceedings of the …, 2021 - dl.acm.org

Machine learning models are widely deployed in production cloud to provide online
inference services. Efficiently deploying inference services requires careful tuning of …

被引用次数：55 相关文章所有 9 个版本

Dissecting the CUDA scheduling hierarchy: a performance and predictability perspective

IS Olmedo, N Capodieci, JL Martinez… - 2020 IEEE Real …, 2020 - ieeexplore.ieee.org

Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in
safety-related domains has opened up many research problems in the real-time community …

被引用次数：72 相关文章所有 5 个版本

高级搜索

QQ 群