Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Batch: Machine learning inference serving on serverless platforms with adaptive batching

A Ali, R Pinciroli, F Yan, E Smirni - … International Conference for …, 2020 - ieeexplore.ieee.org
Serverless computing is a new pay-per-use cloud service paradigm that automates resource
scaling for stateless functions and can potentially facilitate bursty machine learning serving …

Coordinated batching and DVFS for DNN inference on GPU accelerators

SM Nabavinejad, S Reda… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Employing hardware accelerators to improve the performance and energy-efficiency of DNN
applications is on the rise. One challenge of using hardware accelerators, including the GPU …

Tbdb: Token bucket-based dynamic batching for resource scheduling supporting neural network inference in intelligent consumer electronics

H Gao, B Qiu, Y Wang, S Yu, Y Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Consumer electronics such as mobile phones, wearable devices, and vehicle electronics
use many intelligent applications such as voice commands, machine translation, and face …

EAIS: Energy-aware adaptive scheduling for CNN inference on high-performance GPUs

C Yao, W Liu, W Tang, S Hu - Future Generation Computer Systems, 2022 - Elsevier
Recently, a large number of convolutional neural network (CNN) inference services have
emerged on high-performance Graphic Processing Units (GPUs). However, GPUs are high …

Kalmia: A heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers

Z Fu, J Ren, D Zhang, Y Zhou… - IEEE INFOCOM 2022 …, 2022 - ieeexplore.ieee.org
Motivated by the popularity of edge intelligence, DNN services have been widely deployed
at the edge, posing significant performance pressure on edge servers. How to improve the …

Iris: interference and resource aware predictive orchestration for ml inference serving

A Ferikoglou, P Chrysomeris… - 2023 IEEE 16th …, 2023 - ieeexplore.ieee.org
Over the last years, the ever-growing number of Machine Learning (ML) and Artificial
Intelligence (AI) applications deployed in the Cloud has led to high demands on the …

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

Perseus: Characterizing performance and cost of multi-tenant serving for cnn models

M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …