相关文章- 学术资源搜索

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：346 相关文章所有 11 个版本

[PDF] microsoft.com

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

M Jeon, S Venkataraman, J Qian… - Technical report …, 2018 - microsoft.com

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：75 相关文章

[PDF] usenix.org

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org

Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

被引用次数：48 相关文章所有 3 个版本

[PDF] usenix.org

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W Xiao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org

Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …

被引用次数：159 相关文章所有 11 个版本

[PDF] usenix.org

Gandiva: Introspective cluster scheduling for deep learning

W Xiao, R Bhardwaj, R Ramjee, M Sivathanu… - … USENIX Symposium on …, 2018 - usenix.org

We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …

被引用次数：506 相关文章所有 12 个版本

[PDF] usenix.org

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org

Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

被引用次数：377 相关文章所有 13 个版本

[PDF] usenix.org

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org

Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …

被引用次数：198 相关文章所有 17 个版本

[PDF] arxiv.org

Elastic deep learning in multi-tenant GPU clusters

Y Wu, K Ma, X Yan, Z Liu, Z Cai… - … on Parallel and …, 2021 - ieeexplore.ieee.org

We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (ie,
the number of GPUs), for deep neural network (DNN) training in a GPU cluster. Elasticity can …

被引用次数：47 相关文章所有 6 个版本

[PDF] usenix.org

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W Xiao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org

With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

被引用次数：184 相关文章所有 3 个版本

[PDF] usenix.org

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

H Zhao, Z Han, Z Yang, Q Zhang, F Yang… - … USENIX symposium on …, 2020 - usenix.org

Deep learning training on a shared GPU cluster is becoming a common practice. However,
we observe severe sharing anomaly in production multi-tenant clusters where jobs in some …

被引用次数：75 相关文章所有 7 个版本

高级搜索

QQ 群

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

Gandiva: Introspective cluster scheduling for deep learning

Tiresias: A {GPU} cluster manager for distributed deep learning

Themis: Fair and efficient {GPU} cluster scheduling

Elastic deep learning in multi-tenant GPU clusters

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

{HiveD}: Sharing a {GPU} cluster for deep learning with guarantees

相关搜索

引用