Orion: Interference-aware, Fine-grained GPU Sharing for ML Applications

F Strati, X Ma, A Klimovic - … of the Nineteenth European Conference on …, 2024 - dl.acm.org
GPUs are critical for maximizing the throughput-per-Watt of deep neural network (DNN)
applications. However, DNN applications often underutilize GPUs, even when using large …

Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures

A Navarro, A Vilches, F Corbera, R Asenjo - The Journal of …, 2014 - Springer
This paper explores the possibility of efficiently executing a single application using
multicores simultaneously with multiple GPU accelerators under a parallel task …

Turbomgnn: Improving concurrent gnn training tasks on gpu with fine-grained kernel fusion

W Wu, X Shi, L He, H Jin - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Graph Neural Networks (GNN) have evolved as powerful models for graph representation
learning. Many works have been proposed to support GNN training efficiently on GPU …

Scaling scientific applications on clusters of hybrid multicore/GPU nodes

L Wang, M Huang, VK Narayana… - Proceedings of the 8th …, 2011 - dl.acm.org
Rapid advances in the performance and programmability of graphics accelerators have
made GPU computing a compelling solution for a wide variety of application domains …

Characterizing fine-grained resource utilization for multitasking GPGPU in cloud systems

K Cho, H Bahn - IEEE Access, 2021 - ieeexplore.ieee.org
Managing GPGPU resources in cloud systems is challenging as workloads with various
resource usage patterns coexist. To determine the co-location of workloads, previous …

Raise: Efficient gpu resource management via hybrid scheduling

Y Weng, T Ge, X Zhang, X Zhang… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org
As the de facto high-throughput accelerators, graphics processing units (G PU s) are now
used in a wide spec-trum of fields, including artificial intelligence, high performance …

[PDF][PDF] A programming model for massive data parallelism with data dependencies

Y Zhang, F Mueller, X Cui, T Potok - Workshop on Programming …, 2009 - arcb.csc.ncsu.edu
Accelerating processors can often be more cost and energy effective for a wide range of
data-parallel computing problems than general-purpose processors. For graphics processor …

Analyzing fine-grained resource utilization for efficient GPU workload allocation

Y Park, D Shin, K Cho, H Bahn - The Journal of The Institute of …, 2019 - koreascience.kr
Recently, GPU expands application domains from graphic processing to various kinds of
parallel workloads. However, current GPU systems focus on the maximization of each …

Tacker: Tensor-cuda core kernel fusion for improving the gpu utilization while ensuring qos

H Zhao, W Cui, Q Chen, Y Zhang, Y Lu… - … Symposium on High …, 2022 - ieeexplore.ieee.org
The proliferation of machine learning applications has promoted both CUDA Cores and
Tensor Cores' integration to meet their acceleration demands. While studies have shown …

Gemma in April: A matrix-like parallel programming architecture on OpenCL

T Wu, D Wu, Y Wang, X Zhang, H Luo… - … , Automation & Test …, 2011 - ieeexplore.ieee.org
Nowadays, Graphics Processing Unit (GPU), as a kind of massive parallel processor, has
been widely used in general purposed computing tasks. Although there have been mature …