Cumas: Data transfer aware multi-application scheduling for shared gpus

ME Belviranli, F Khorasani, LN Bhuyan… - Proceedings of the 2016 …, 2016 - dl.acm.org
Recent generations of GPUs and their corresponding APIs provide means for sharing
compute resources among multiple applications with greater efficiency than ever. This …

Slate: Enabling workload-aware efficient multiprocessing for modern GPGPUs

T Allen, X Feng, R Ge - 2019 IEEE international parallel and …, 2019 - ieeexplore.ieee.org
As GPUs now contribute the majority of computing power for HPC and data centers,
improving GPU utilization becomes an important research problem. Sharing GPU among …

[PDF][PDF] Towards multi-tenant GPGPU: Event-driven programming model for system-wide scheduling on shared GPUs

Y Suzuki, H Yamada, S Kato, K Kono - Proceedings of the Workshop …, 2016 - cs.utexas.edu
Graphics processing units (GPUs) are attractive to the generalpurpose computing (GPGPU)
beyond the graphics purpose. Sharing GPUs among such GPGPU applications is a key …

Enabling preemptive multiprogramming on GPUs

I Tanasic, I Gelado, J Cabezas, A Ramirez… - ACM SIGARCH …, 2014 - dl.acm.org
GPUs are being increasingly adopted as compute accelerators in many domains, spanning
environments from mobile systems to cloud computing. These systems are usually running …

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

W Zhang, Q Chen, K Fu, N Zheng, Z Huang… - Proceedings of the 27th …, 2022 - dl.acm.org
Multi-stage user-facing applications on GPUs are widely-used nowa-days, and are often
implemented to be microservices. Prior re-search works are not applicable to ensuring QoS …

Anatomy of gpu memory system for multi-application execution

A Jog, O Kayiran, T Kesten, A Pattnaik… - Proceedings of the …, 2015 - dl.acm.org
As GPUs make headway in the computing landscape spanning mobile platforms,
supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of …

Toward supporting multi-GPU targets via taskloop and user-defined schedules

V Kale, W Lu, A Curtis, AM Malik, B Chapman… - OpenMP: Portable Multi …, 2020 - Springer
Many modern supercomputers such as ORNL's Summit, LLNL's Sierra, and LBL's upcoming
Perlmutter offer or will offer multiple, eg, 4 to 8, GPUs per node for running computational …

Simultaneous multikernel: Fine-grained sharing of gpus

Z Wang, J Yang, R Melhem, B Childers… - IEEE Computer …, 2015 - ieeexplore.ieee.org
Studies show that non-graphics programs can be less optimized for the GPU hardware,
leading to significant resource under-utilization. Sharing the GPU among multiple programs …

Case: A compiler-assisted scheduling framework for multi-gpu systems

C Chen, C Porter, S Pande - Proceedings of the 27th ACM SIGPLAN …, 2022 - dl.acm.org
Modern computing platforms tend to deploy multiple GPUs on a single node to boost
performance. GPUs have large computing capacities and are an expensive resource …

Automatically exploiting implicit pipeline parallelism from multiple dependent kernels for gpus

G Kim, J Jeong, J Kim, M Stephenson - Proceedings of the 2016 …, 2016 - dl.acm.org
Execution of GPGPU workloads consists of different stages including data I/O on the CPU,
memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle …