相关文章- 学术资源搜索

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W Xiao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org

Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …

被引用次数：157 相关文章所有 11 个版本

[PDF] usenix.org

Gandiva: Introspective cluster scheduling for deep learning

W Xiao, R Bhardwaj, R Ramjee, M Sivathanu… - … USENIX Symposium on …, 2018 - usenix.org

We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …

被引用次数：503 相关文章所有 12 个版本

[PDF] usenix.org

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org

Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

被引用次数：374 相关文章所有 14 个版本

[PDF] microsoft.com

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

M Jeon, S Venkataraman, J Qian… - Technical report …, 2018 - microsoft.com

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：74 相关文章

[PDF] usenix.org

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：344 相关文章所有 11 个版本

[PDF] yezhisheng.me

Chronus: A novel deadline-aware scheduler for deep learning training jobs

W Gao, Z Ye, P Sun, Y Wen, T Zhang - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org

Modern GPU clusters support Deep Learning training (DLT) jobs in a distributed manner.
Job scheduling is the key to improve the training performance, resource utilization and …

被引用次数：29 相关文章所有 4 个版本

[PDF] usenix.org

{Heterogeneity-Aware} cluster scheduling policies for deep learning workloads

D Narayanan, K Santhanam, F Kazhamiaka… - … USENIX Symposium on …, 2020 - usenix.org

Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been
increasingly deployed to train deep learning models. These accelerators exhibit …

被引用次数：188 相关文章所有 12 个版本

[PDF] yibozhu.com

Multi-resource interleaving for deep learning training

Y Zhao, Y Liu, Y Peng, Y Zhu, X Liu, X Jin - Proceedings of the ACM …, 2022 - dl.acm.org

Training Deep Learning (DL) model requires multiple resource types, including CPUs,
GPUs, storage IO, and network IO. Advancements in DL have produced a wide spectrum of …

被引用次数：35 相关文章所有 4 个版本

[PDF] ieee.org

Horus: Interference-aware and prediction-based scheduling in deep learning systems

G Yeung, D Borowiec, R Yang, A Friday… - … on Parallel and …, 2021 - ieeexplore.ieee.org

To accelerate the training of Deep Learning (DL) models, clusters of machines equipped
with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of …

被引用次数：56 相关文章所有 7 个版本

[PDF] usenix.org

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org

Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

被引用次数：46 相关文章所有 2 个版本

高级搜索

QQ 群

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

Gandiva: Introspective cluster scheduling for deep learning

Tiresias: A {GPU} cluster manager for distributed deep learning

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

Chronus: A novel deadline-aware scheduler for deep learning training jobs

{Heterogeneity-Aware} cluster scheduling policies for deep learning workloads

Multi-resource interleaving for deep learning training

Horus: Interference-aware and prediction-based scheduling in deep learning systems

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

相关搜索

引用