相关文章- 学术资源搜索

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Q Weng, L Yang, Y Yu, W Wang, X Tang… - 2023 USENIX Annual …, 2023 - usenix.org

Large tech companies are piling up a massive number of GPUs in their server fleets to run
diverse machine learning (ML) workloads. However, these expensive devices often suffer …

被引用次数：10 相关文章所有 9 个版本

[PDF] arxiv.org

Salus: Fine-grained gpu sharing primitives for deep learning applications

P Yu, M Chowdhury - arXiv preprint arXiv:1902.04610, 2019 - arxiv.org

GPU computing is becoming increasingly more popular with the proliferation of deep
learning (DL) applications. However, unlike traditional resources such as CPU or the …

被引用次数：77 相关文章所有 7 个版本

[PDF] usenix.org

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org

Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …

被引用次数：211 相关文章所有 17 个版本

[PDF] usenix.org

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W Xiao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org

With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

被引用次数：181 相关文章所有 3 个版本

[PDF] nsf.gov

Schedtune: A heterogeneity-aware gpu scheduler for deep learning

H Albahar, S Dongare, Y Du, N Zhao… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org

Modern cluster management systems, such as Kubernetes, support heterogeneous
workloads and resources. However, existing resource schedulers in these systems do not …

被引用次数：12 相关文章所有 6 个版本

[PDF] usenix.org

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

M Jeon, S Venkataraman, A Phanishayee… - 2019 USENIX Annual …, 2019 - usenix.org

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：344 相关文章所有 11 个版本

[PDF] mlsys.org

Fine-grained GPU sharing primitives for deep learning applications

P Yu, M Chowdhury - Proceedings of Machine Learning and …, 2020 - proceedings.mlsys.org

Unlike traditional resources such as CPU or the network, modern GPUs do not natively
support fine-grained sharing primitives. Consequently, implementing common policies such …

被引用次数：47 相关文章

[PDF] acm.org

Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters

Z Bian, S Li, W Wang, Y You - … of the International Conference for High …, 2021 - dl.acm.org

Efficient GPU resource scheduling is essential to maximize resource utilization and save
training costs for the increasing amount of deep learning workloads in shared GPU clusters …

被引用次数：21 相关文章所有 5 个版本

[PDF] github.io

CODA: Improving resource utilization by slimming and co-locating DNN and CPU jobs

H Zhao, W Cui, Q Chen, J Leng, K Yu… - 2020 IEEE 40th …, 2020 - ieeexplore.ieee.org

While deep neural network (DNN) models are often trained on GPUs, many companies and
research institutes build GPU clusters that are shared by different groups. On such GPU …

被引用次数：14 相关文章所有 6 个版本

[PDF] microsoft.com

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

M Jeon, S Venkataraman, J Qian… - Technical report …, 2018 - microsoft.com

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：74 相关文章

高级搜索

QQ 群

Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent

Salus: Fine-grained gpu sharing primitives for deep learning applications

Themis: Fair and efficient {GPU} cluster scheduling

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Schedtune: A heterogeneity-aware gpu scheduler for deep learning

Analysis of {Large-Scale}{Multi-Tenant}{GPU} clusters for {DNN} training workloads

Fine-grained GPU sharing primitives for deep learning applications

Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters

CODA: Improving resource utilization by slimming and co-locating DNN and CPU jobs

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

相关搜索

引用