Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

B Li, T Patel, S Samsi, V Gadepally… - Proceedings of the 13th …, 2022 - dl.acm.org
GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …

Characterizing multi-instance gpu for machine learning workloads

B Li, V Gadepally, S Samsi… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
As machine learning (ML) becomes more and more popular, datacenter operators use
hardware accelerators such as GPUs to tackle the high computation demand of ML …

{MLaaS} in the wild: Workload analysis and scheduling in {Large-Scale} heterogeneous {GPU} clusters

Q Weng, W Xiao, Y Yu, W Wang, C Wang, J He… - … USENIX Symposium on …, 2022 - usenix.org
With the sustained technological advances in machine learning (ML) and the availability of
massive datasets recently, tech companies are deploying large ML-as-a-Service (MLaaS) …

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

M Becchi, K Sajjapongse, I Graves, A Procter… - Proceedings of the 21st …, 2012 - dl.acm.org
Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters.
Nevertheless, cloud computing services and resource management frameworks targeting …

Multi-tenancy on GPGPU-based servers

D Sengupta, R Belapure, K Schwan - Proceedings of the 7th …, 2013 - dl.acm.org
While GPUs have become prominent both in high performance computing and in online or
cloud services, they still appear as explicitly selected'devices' rather than as first class …

Slate: Enabling workload-aware efficient multiprocessing for modern GPGPUs

T Allen, X Feng, R Ge - 2019 IEEE international parallel and …, 2019 - ieeexplore.ieee.org
As GPUs now contribute the majority of computing power for HPC and data centers,
improving GPU utilization becomes an important research problem. Sharing GPU among …

Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite

A Li, SL Song, J Chen, X Liu, N Tallent… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

Scaling scientific applications on clusters of hybrid multicore/GPU nodes

L Wang, M Huang, VK Narayana… - Proceedings of the 8th …, 2011 - dl.acm.org
Rapid advances in the performance and programmability of graphics accelerators have
made GPU computing a compelling solution for a wide variety of application domains …

Themis: Fair and efficient {GPU} cluster scheduling

K Mahajan, A Balasubramanian, A Singhvi… - … USENIX Symposium on …, 2020 - usenix.org
Modern distributed machine learning (ML) training workloads benefit significantly from
leveraging GPUs. However, significant contention ensues when multiple such workloads are …

Interference-driven resource management for GPU-based heterogeneous clusters

R Phull, CH Li, K Rao, H Cadambi… - Proceedings of the 21st …, 2012 - dl.acm.org
GPU-based clusters are increasingly being deployed in HPC environments to accelerate a
variety of scientific applications. Despite their growing popularity, the GPU devices …