Fine-grained scheduling for containerized hpc workloads in kubernetes clusters

P Liu, J Guitart - 2022 IEEE 24th Int Conf on High Performance …, 2022 - ieeexplore.ieee.org
Containerization technology offers lightweight OS-level virtualization, and enables
portability, reproducibility, and flexibility by packing applications with low performance …

Horus: Interference-aware and prediction-based scheduling in deep learning systems

G Yeung, D Borowiec, R Yang, A Friday… - … on Parallel and …, 2021 - ieeexplore.ieee.org
To accelerate the training of Deep Learning (DL) models, clusters of machines equipped
with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of …

[PDF][PDF] DRAGON: A Dynamic Scheduling and Scaling Controller for Managing Distributed Deep Learning Jobs in Kubernetes Cluster.

CY Lin, TA Yeh, J Chou - CLOSER, 2019 - scitepress.org
With the fast growing trend in deep learning driven AI services over the past decade, deep
learning, especially the resource-intensive and time-consuming training jobs, have become …

Memcachedgpu: Scaling-up scale-out key-value stores

TH Hetherington, M O'Connor, TM Aamodt - Proceedings of the Sixth …, 2015 - dl.acm.org
This paper tackles the challenges of obtaining more efficient data center computing while
maintaining low latency, low cost, programmability, and the potential for workload …

Gaia scheduler: A kubernetes-based scheduler framework

S Song, L Deng, J Gong, H Luo - … IEEE Intl Conf on Parallel & …, 2018 - ieeexplore.ieee.org
This paper proposed a topology-based GPU scheduling framework. The framework is based
on the traditional kubernetes GPU scheduling algorithm. In existing algorithms, GPU can …

Improving data center efficiency through holistic scheduling in kubernetes

P Townend, S Clement, D Burdett… - … on Service-Oriented …, 2019 - ieeexplore.ieee.org
Data centers are the infrastructure that underpins modern distributed service-oriented
systems. They are complex systems-of-systems, with many interacting elements, that …

Topology-aware gpu scheduling for learning workloads in cloud environments

M Amaral, J Polo, D Carrera, S Seelam… - Proceedings of the …, 2017 - dl.acm.org
Recent advances in hardware, such as systems with multiple GPUs and their availability in
the cloud, are enabling deep learning in various domains including health care …

Accelerator-aware Kubernetes scheduler for DNN tasks on edge computing environment

J Park, U Choi, S Kum, J Moon… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
The compute capability of edge devices is expanding owing to the wide adoption of edge
computing for various application scenarios and specialized hardware explicitly developed …

Deadline-aware offloading for high-throughput accelerators

TT Yeh, MD Sinclair, BM Beckmann… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and
increasingly are being considered for latency-sensitive applications in datacenters …

Hybrid Computing for Interactive Datacenter Applications

P Patel, K Lim, K Jhunjhunwalla, A Martinez… - arXiv preprint arXiv …, 2023 - arxiv.org
Field-Programmable Gate Arrays (FPGAs) are more energy efficient and cost effective than
CPUs for a wide variety of datacenter applications. Yet, for latency-sensitive and bursty …