Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

Clover: Toward sustainable ai with carbon-aware machine learning inference service

B Li, S Samsi, V Gadepally, D Tiwari - Proceedings of the International …, 2023 - dl.acm.org
This paper presents a solution to the challenge of mitigating carbon emissions from hosting
large-scale machine learning (ML) inference services. ML inference is critical to modern …

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q Jin, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

A stochastic approach for scheduling AI training jobs in GPU-based systems

F Filippini, J Anselmi, D Ardagna… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this work, we optimize the scheduling of Deep Learning (DL) training jobs from the
perspective of a Cloud Service Provider running a data center, which efficiently selects …

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

H Mo, L Zhu, L Shi, S Tan, S Wang - Electronics, 2023 - mdpi.com
To accelerate the inference of machine-learning (ML) model serving, clusters of machines
require the use of expensive hardware accelerators (eg, GPUs) to reduce execution time …

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs

A Chen, F Xu, L Han, Y Dong, L Chen, Z Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
GPUs have become the defacto hardware devices to accelerate Deep Neural Network
(DNN) inference in deep learning (DL) frameworks. However, the conventional sequential …

Opportunities of Renewable Energy Powered DNN Inference

SM Nabavinejad, T Guo - arXiv preprint arXiv:2306.12247, 2023 - arxiv.org
With the proliferation of the adoption of renewable energy in powering data centers,
addressing the challenges of such energy sources has attracted researchers from academia …

Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

F Liang, Z Zhang, H Lu, C Li, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With rapidly increasing distributed deep learning workloads in large-scale data centers,
efficient distributed deep learning framework strategies for resource allocation and workload …

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

D Sanyal, JT Hung, M Agrawal, P Jasti… - arXiv preprint arXiv …, 2023 - arxiv.org
With the emergence of large foundational models, model-serving systems are becoming
popular. In such a system, users send the queries to the server and specify the desired …