A survey of multi-tenant deep learning inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, C Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …

One-shot tuner for deep learning compilers

J Ryu, E Park, H Sung - Proceedings of the 31st ACM SIGPLAN …, 2022 - dl.acm.org
Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks.
While existing work can generate deep learning models that exceed the performance of …

Metatune: Meta-learning based cost model for fast and efficient auto-tuning frameworks

J Ryu, H Sung - arXiv preprint arXiv:2102.04199, 2021 - arxiv.org
Deep learning compiler frameworks are gaining ground as a more portable back-end for
deep learning applications on increasingly diverse hardware. However, they face the …

Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing

T Adufu, J Ha, Y Kim - 2024 International Conference on …, 2024 - ieeexplore.ieee.org
Graphic Processing Units (GPUs) are gradually becoming mainstream computing resource
for efficient execution of applications both on-premises and in the cloud. Currently however …

Slice-tune: A system for high performance dnn autotuning

A Dhakal, KK Ramakrishnan, SG Kulkarni… - Proceedings of the 23rd …, 2022 - dl.acm.org
Autotuning DNN models prior to their deployment is an essential but time-consuming task.
Using expensive (and power-hungry) GPU and TPU accelerators efficiently is also key …

Job Recommendation Service for GPU Sharing in Kubernetes

A Ray, K Lafata, Z Zhang, Y Xiong… - 2023 IEEE Cloud …, 2023 - ieeexplore.ieee.org
Cloud infrastructures encourage the multi-tenancy of hardware resources. User-defined
Machine Learning (ML) training jobs are offloaded to the cloud for efficient training. State-of …

Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUs

T Adufu, J Ha, Y Kim - … 2024-2024 IEEE Network Operations and …, 2024 - ieeexplore.ieee.org
Graphic Processing Units (GPUs) are known for the large computing capabilities they offer
users compared to traditional CPUs. However, the issue of resource under-utilization is …

[图书][B] Cooperative Design of Machine Learning and GPU-Based Systems for Inference

A Dhakal - 2022 - search.proquest.com
Our work seeks to improve and adapt computing systems and machine learning (ML)
algorithms to match each other's requirements and capabilities. Due to the high …