Firmament: Fast, centralized cluster scheduling at scale

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019 - dl.acm.org

Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

被引用次数：725 相关文章所有 13 个版本

[PDF] usenix.org

Gandiva: Introspective cluster scheduling for deep learning

W Xiao, R Bhardwaj, R Ramjee, M Sivathanu… - … USENIX Symposium on …, 2018 - usenix.org

We introduce Gandiva, a new cluster scheduling framework that utilizes domain-specific
knowledge to improve latency and efficiency of training deep learning models in a GPU …

被引用次数：521 相关文章所有 12 个版本

[PDF] mlsys.org

Beyond data and model parallelism for deep neural networks.

Z Jia, M Zaharia, A Aiken - Proceedings of Machine Learning …, 2019 - proceedings.mlsys.org

Existing deep learning systems commonly parallelize deep neural network (DNN) training
using data or model parallelism, but these strategies often result in suboptimal …

被引用次数：506 相关文章所有 13 个版本

[PDF] researchgate.net

Cluster resource scheduling in cloud computing: literature review and research challenges

W Khallouli, J Huang - The Journal of supercomputing, 2022 - Springer

Scheduling plays a pivotal role in cloud computing systems. Designing an efficient
scheduler is a challenging task. The challenge comes from several aspects, including the …

被引用次数：30 相关文章所有 4 个版本

[PDF] usenix.org

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org

Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

被引用次数：389 相关文章所有 13 个版本

[PDF] kaust.edu.sa

Optimus: an efficient dynamic resource scheduler for deep learning clusters

Y Peng, Y Bao, Y Chen, C Wu, C Guo - Proceedings of the Thirteenth …, 2018 - dl.acm.org

Deep learning workloads are common in today's production clusters due to the proliferation
of deep learning driven AI services (eg, speech recognition, machine translation). A deep …

被引用次数：475 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Optimized container scheduling for data-intensive serverless edge computing

T Rausch, A Rashed, S Dustdar - Future Generation Computer Systems, 2021 - Elsevier

Operating data-intensive applications on edge systems is challenging, due to the extreme
workload and device heterogeneity, as well as the geographic dispersion of compute and …

被引用次数：155 相关文章所有 6 个版本

[PDF] usenix.org

Serving {DNNs} like clockwork: Performance predictability from the bottom up

A Gujarati, R Karimi, S Alzayat, W Hao… - … USENIX Symposium on …, 2020 - usenix.org

Machine learning inference is becoming a core building block for interactive web
applications. As a result, the underlying model serving systems on which these applications …

被引用次数：225 相关文章所有 16 个版本

[PDF] nsf.gov

Faster and cheaper serverless computing on harvested resources

Y Zhang, Í Goiri, GI Chaudhry, R Fonseca… - Proceedings of the …, 2021 - dl.acm.org

Serverless computing is becoming increasingly popular due to its ease of programming, fast
elasticity, and fine-grained billing. However, the serverless provider still needs to provision …

被引用次数：95 相关文章所有 4 个版本

[PDF] usenix.org

Protean:{VM} allocation service at scale

O Hadary, L Marshall, I Menache, A Pan… - … USENIX Symposium on …, 2020 - usenix.org

We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …

被引用次数：149 相关文章所有 7 个版本

高级搜索

QQ 群