Deadline-aware offloading for high-throughput accelerators

TT Yeh, MD Sinclair, BM Beckmann… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and
increasingly are being considered for latency-sensitive applications in datacenters …

Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

A Dhakal, SG Kulkarni, KK Ramakrishnan - arXiv preprint arXiv …, 2023 - arxiv.org
Hardware accelerators such as GPUs are required for real-time, low-latency inference with
Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they …

ArkGPU: enabling applications' high-goodput co-location execution on multitasking GPUs

J Lou, Y Sun, J Zhang, H Cao, Y Zhang… - CCF Transactions on High …, 2023 - Springer
With the development of deep learning, hardware accelerators represented by GPUs have
been used to accelerate the execution of deep learning applications. A key problem in GPU …

Characterizing concurrency mechanisms for nvidia gpus under deep learning workloads

G Gilman, RJ Walls - ACM SIGMETRICS Performance Evaluation …, 2022 - dl.acm.org
Hazelwood et al. observed that at Facebook data centers, variations in user activity (eg due
to diurnal load) resulted in low utilization periods with large pools of idle resources [4]. To …

DeepBoot: Dynamic Scheduling System for Training and Inference Deep Learning Tasks in GPU Cluster

Z Chen, X Zhao, C Zhi, J Yin - IEEE Transactions on Parallel …, 2023 - ieeexplore.ieee.org
Deep learning tasks (DLT) include training and inference tasks, where training DLTs have
requirements on minimizing average job completion time (JCT) and inference tasks need …

Ebird: Elastic batch for improving responsiveness and throughput of deep learning services

W Cui, M Wei, Q Chen, X Tang, J Leng… - 2019 IEEE 37th …, 2019 - ieeexplore.ieee.org
GPUs have been widely adopted to serve online deep learning-based services that have
stringent QoS requirements. However, emerging deep learning serving systems often result …

Edge: Event-driven gpu execution

TH Hetherington, M Lubeznov, D Shah… - 2019 28th …, 2019 - ieeexplore.ieee.org
GPUs are known to benefit structured applications with ample parallelism, such as deep
learning in a datacenter. Recently, GPUs have shown promise for irregular streaming …

CARSS: Client-aware resource sharing and scheduling for heterogeneous applications

I Baek, M Harding, A Kanda, KR Choi… - 2020 IEEE Real …, 2020 - ieeexplore.ieee.org
Modern hardware accelerators such as GP-GPUs and DSPs are commonly being used in
real-time settings such as high-performance multimedia systems and autonomous vehicles …

SchedTune: A heterogeneity-aware GPU scheduler for deep learning

H Albahar, S Dongare, Y Du, N Zhao… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org
Modern cluster management systems, such as Kubernetes, support heterogeneous
workloads and resources. However, existing resource schedulers in these systems do not …