相关文章- 学术资源搜索

Fifer: Tackling resource underutilization in the serverless era

JR Gunasekaran, P Thinakaran… - Proceedings of the 21st …, 2020 - dl.acm.org

Datacenters are witnessing a rapid surge in the adoption of serverless functions for
microservices-based applications. A vast majority of these microservices typically span less …

被引用次数：75 相关文章所有 11 个版本

Nanily: A qos-aware scheduling for dnn inference workload in clouds

X Tang, P Wang, Q Liu, W Wang… - 2019 IEEE 21st …, 2019 - ieeexplore.ieee.org

DNN inferences are widely emerging as a service and must run in sub-second latency,
which need GPU hardware to achieve parallel accelerating. Prior works to improve the …

被引用次数：21 相关文章所有 2 个版本

[PDF] acm.org

AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads

W Gao, X Zhang, S Huang, S Guo, P Sun… - Proceedings of the 38th …, 2024 - dl.acm.org

Modern Deep Learning Training (DLT) schedulers in GPU datacenters are designed to be
very sophisticated with many configurations. These configurations need to be adjusted …

被引用次数：1 相关文章所有 2 个版本

[PDF] osti.gov

It's a Scheduling Affair: GROMACS in the Cloud with the KubeFlux Scheduler

C Misale, M Drocco, DJ Milroy… - … on Containers and …, 2021 - ieeexplore.ieee.org

In this work, we address the problem of running HPC workloads efficiently on Kubernetes
clusters. To do so, we compare the Kubernetes' default scheduler with KubeFlux, a …

被引用次数：17 相关文章所有 3 个版本

Pipe-torch: Pipeline-based distributed deep learning in a gpu cluster with heterogeneous networking

J Zhan, J Zhang - … Conference on Advanced Cloud and Big …, 2019 - ieeexplore.ieee.org

Because training a deep neural network (DNN) takes arduous amounts of time and
computation, often researchers expedite the training process via distributed parallel training …

被引用次数：31 相关文章所有 2 个版本

[PDF] ieee.org

Preemptive and low latency datacenter scheduling via lightweight containers

W Chen, X Zhou, J Rao - IEEE Transactions on Parallel and …, 2019 - ieeexplore.ieee.org

Datacenters are evolving to host heterogeneous workloads on shared clusters to reduce the
operational cost and achieve higher resource utilization. However, it is challenging to …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

F Xu, J Xu, J Chen, L Chen, R Shang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN)
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …

被引用次数：15 相关文章所有 5 个版本

[PDF] microsoft.com

[PDF][PDF] Multi-tenant GPU clusters for deep learning workloads: Analysis and implications

M Jeon, S Venkataraman, J Qian… - Technical report …, 2018 - microsoft.com

With widespread advances in machine learning, a number of large enterprises are
beginning to incorporate machine learning models across a number of products. These …

被引用次数：75 相关文章

[PDF] usenix.org

Looking beyond {GPUs} for {DNN} scheduling on {Multi-Tenant} clusters

J Mohan, A Phanishayee, J Kulkarni… - … USENIX Symposium on …, 2022 - usenix.org

Training Deep Neural Networks (DNNs) is a popular workload in both enterprises and cloud
data centers. Existing schedulers for DNN training consider GPU as the dominant resource …

被引用次数：48 相关文章所有 3 个版本

Astraea: A fair deep learning scheduler for multi-tenant gpu clusters

Z Ye, P Sun, W Gao, T Zhang, X Wang… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Modern GPU clusters are designed to support distributed Deep Learning jobs from multiple
tenants concurrently. Each tenant may have varied and dynamic resource demands …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群