P Yu, M Chowdhury - arXiv preprint arXiv:1902.04610, 2019 - arxiv.org
GPU computing is becoming increasingly more popular with the proliferation of deep learning (DL) applications. However, unlike traditional resources such as CPU or the …
Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are …
With the sustained technological advances in machine learning (ML) and the availability of massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …
H Albahar, S Dongare, Y Du, N Zhao… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org
Modern cluster management systems, such as Kubernetes, support heterogeneous workloads and resources. However, existing resource schedulers in these systems do not …
With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products. These …
P Yu, M Chowdhury - Proceedings of Machine Learning and …, 2020 - proceedings.mlsys.org
Unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. Consequently, implementing common policies such …
Z Bian, S Li, W Wang, Y You - … of the International Conference for High …, 2021 - dl.acm.org
Efficient GPU resource scheduling is essential to maximize resource utilization and save training costs for the increasing amount of deep learning workloads in shared GPU clusters …
While deep neural network (DNN) models are often trained on GPUs, many companies and research institutes build GPU clusters that are shared by different groups. On such GPU …
With widespread advances in machine learning, a number of large enterprises are beginning to incorporate machine learning models across a number of products. These …