Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

PED: Probabilistic Energy-efficient Deadline-aware scheduler for heterogeneous SoCs

X Chen, A Krishnakumar, U Ogras… - Journal of Systems …, 2024 - Elsevier
Heterogeneous systems-on-chip (SoCs) integrate diverse cores with different performance
and energy tradeoffs. Scheduling applications with soft deadline constraints is highly …

Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

F Liang, Z Zhang, H Lu, C Li, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
With rapidly increasing distributed deep learning workloads in large-scale data centers,
efficient distributed deep learning framework strategies for resource allocation and workload …

UniSched: A Unified Scheduler for Deep Learning Training Jobs with Different User Demands

W Gao, Z Ye, P Sun, T Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The growth of deep learning training (DLT) jobs in modern GPU clusters calls for efficient
deep learning (DL) scheduler designs. Due to the extensive applications of DL technology …

Simultaneous and Heterogenous Multithreading

KC Hsu, HW Tseng - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
The landscape of modern computers is undoubtedly heterogeneous, as all computing
platforms integrate multiple types of processing units and hardware accelerators. However …

A Stochastic Approach for Scheduling AI Training Jobs in GPU-based Systems

F Filippini, J Anselmi, D Ardagna… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this work, we optimize the scheduling of Deep Learning (DL) training jobs from the
perspective of a Cloud Service Provider running a data center, which efficiently selects …

[PDF][PDF] Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Cluster

A Sultana, F Xu, X Yuan, L Chen, NF Tzeng - prefer-nsf.org
With the wide adoption of deep neural network (DNN) models for various applications,
enterprises, and cloud providers have built deep learning clusters and increasingly …