Characterization and prediction of deep learning workloads in large-scale gpu datacenters

Q Sun, Y Liu, H Yang, R Zhang, M Dun… - … Conference for High …, 2022 - ieeexplore.ieee.org

Graph neural networks (GNNs) suffer from low GPU utilization due to frequent memory
accesses. Existing concurrent training mechanisms cannot be directly adapted to GNNs …

被引用次数：8 相关文章所有 5 个版本

[PDF] acm.org

Characterizing Power Management Opportunities for LLMs in the Cloud

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - Proceedings of the 29th …, 2024 - dl.acm.org

Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …

被引用次数：2 相关文章所有 4 个版本

[PDF] acm.org

Funcpipe: A pipelined serverless framework for fast and cost-efficient training of deep learning models

Y Liu, B Jiang, T Guo, Z Huang, W Ma, X Wang… - Proceedings of the …, 2022 - dl.acm.org

Training deep learning (DL) models in the cloud has become a norm. With the emergence of
serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems …

被引用次数：7 相关文章所有 4 个版本

[PDF] nsf.gov

Schedtune: A heterogeneity-aware gpu scheduler for deep learning

H Albahar, S Dongare, Y Du, N Zhao… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org

Modern cluster management systems, such as Kubernetes, support heterogeneous
workloads and resources. However, existing resource schedulers in these systems do not …

被引用次数：12 相关文章所有 6 个版本

Astraea: A fair deep learning scheduler for multi-tenant gpu clusters

Z Ye, P Sun, W Gao, T Zhang, X Wang… - … on Parallel and …, 2021 - ieeexplore.ieee.org

Modern GPU clusters are designed to support distributed Deep Learning jobs from multiple
tenants concurrently. Each tenant may have varied and dynamic resource demands …

被引用次数：12 相关文章所有 3 个版本

How different are the cloud workloads? characterizing large-scale private and public cloud workloads

X Qin, M Ma, Y Zhao, J Zhang, C Du… - 2023 53rd Annual …, 2023 - ieeexplore.ieee.org

With the rapid development of cloud systems, an increasing number of service workloads
are deployed in the private cloud and/or public cloud. Although large cloud providers such …

被引用次数：3 相关文章所有 2 个版本

[PDF] google.com

Characterizing multi-instance gpu for machine learning workloads

B Li, V Gadepally, S Samsi… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

As machine learning (ML) becomes more and more popular, datacenter operators use
hardware accelerators such as GPUs to tackle the high computation demand of ML …

被引用次数：11 相关文章所有 3 个版本

Hydra: Deadline-aware and efficiency-oriented scheduling for deep learning jobs on heterogeneous gpus

Z Yang, H Wu, Y Xu, Y Wu, H Zhong… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the rapid proliferation of deep learning (DL) jobs running on heterogeneous GPUs,
scheduling DL jobs to meet various scheduling requirements, such as meeting deadlines …

被引用次数：7 相关文章所有 3 个版本

[PDF] hku.hk

TapFinger: Task placement and fine-grained resource allocation for edge machine learning

Y Li, T Zeng, X Zhang, J Duan… - IEEE INFOCOM 2023 …, 2023 - ieeexplore.ieee.org

Machine learning (ML) tasks are one of the major workloads in today's edge computing
networks. Existing edge-cloud schedulers allocate the requested amounts of resources to …

被引用次数：9 相关文章所有 6 个版本

[PDF] acm.org

Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters

Z Mo, H Xu, C Xu - Proceedings of the 29th ACM International …, 2024 - dl.acm.org

Modern GPU clusters inherently exhibit heterogeneity, encompassing various aspects such
as computation and communication. This heterogeneity poses a significant challenge for the …

被引用次数：1 相关文章

高级搜索

QQ 群