Analyzing machine learning workloads using a detailed GPU simulator

J Lew, DA Shah, S Pati, S Cattell… - … analysis of systems …, 2019 - ieeexplore.ieee.org
Machine learning (ML) has recently emerged as an important application driving future
architecture design. Traditionally, architecture research has used detailed simulators to …

Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs

H Kim, WJ Song - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org
This article presents a graphics processing unit (GPU) scheduling scheme that maximizes
the exploitation of data locality in deep neural networks (DNNs). Convolution is one of the …

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling

S Jayaram Subramanya, D Arfeen, S Lin… - Proceedings of the 29th …, 2023 - dl.acm.org
The Sia scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to
elastic resource-adaptive jobs. Although some recent schedulers address one aspect or …

Elastic deep learning in multi-tenant GPU clusters

Y Wu, K Ma, X Yan, Z Liu, Z Cai… - … on Parallel and …, 2021 - ieeexplore.ieee.org
We study how to support elasticity, that is, the ability to dynamically adjust the parallelism (ie,
the number of GPUs), for deep neural network (DNN) training in a GPU cluster. Elasticity can …

Characterizing concurrency mechanisms for nvidia gpus under deep learning workloads

G Gilman, RJ Walls - ACM SIGMETRICS Performance Evaluation …, 2022 - dl.acm.org
Hazelwood et al. observed that at Facebook data centers, variations in user activity (eg due
to diurnal load) resulted in low utilization periods with large pools of idle resources [4]. To …

Multi-model machine learning inference serving with gpu spatial partitioning

S Choi, S Lee, Y Kim, J Park, Y Kwon, J Huh - arXiv preprint arXiv …, 2021 - arxiv.org
As machine learning techniques are applied to a widening range of applications, high
throughput machine learning (ML) inference servers have become critical for online service …

Topology-aware gpu scheduling for learning workloads in cloud environments

M Amaral, J Polo, D Carrera, S Seelam… - Proceedings of the …, 2017 - dl.acm.org
Recent advances in hardware, such as systems with multiple GPUs and their availability in
the cloud, are enabling deep learning in various domains including health care …

Aryl: An elastic cluster scheduler for deep learning

J Li, H Xu, Y Zhu, Z Liu, C Guo, C Wang - arXiv preprint arXiv:2202.07896, 2022 - arxiv.org
Companies build separate training and inference GPU clusters for deep learning, and use
separate schedulers to manage them. This leads to problems for both training and inference …

EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs

M Li, W Xiao, H Yang, B Sun, H Zhao, S Ren… - Proceedings of the …, 2023 - dl.acm.org
Distributed synchronized GPU training is commonly used for deep learning. The resource
constraint of using a fixed number of GPUs makes large-scale training jobs suffer from long …

Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …