Nanily: A qos-aware scheduling for dnn inference workload in clouds

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

被引用次数：23 相关文章所有 3 个版本

[PDF] nsf.gov

Batch: Machine learning inference serving on serverless platforms with adaptive batching

A Ali, R Pinciroli, F Yan, E Smirni - … International Conference for …, 2020 - ieeexplore.ieee.org

Serverless computing is a new pay-per-use cloud service paradigm that automates resource
scaling for stateless functions and can potentially facilitate bursty machine learning serving …

被引用次数：138 相关文章所有 10 个版本

[PDF] nsf.gov

Coordinated batching and DVFS for DNN inference on GPU accelerators

SM Nabavinejad, S Reda… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Employing hardware accelerators to improve the performance and energy-efficiency of DNN
applications is on the rise. One challenge of using hardware accelerators, including the GPU …

被引用次数：45 相关文章所有 4 个版本

Tbdb: Token bucket-based dynamic batching for resource scheduling supporting neural network inference in intelligent consumer electronics

H Gao, B Qiu, Y Wang, S Yu, Y Xu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Consumer electronics such as mobile phones, wearable devices, and vehicle electronics
use many intelligent applications such as voice commands, machine translation, and face …

被引用次数：15 相关文章所有 2 个版本

EAIS: Energy-aware adaptive scheduling for CNN inference on high-performance GPUs

C Yao, W Liu, W Tang, S Hu - Future Generation Computer Systems, 2022 - Elsevier

Recently, a large number of convolutional neural network (CNN) inference services have
emerged on high-performance Graphic Processing Units (GPUs). However, GPUs are high …

被引用次数：18 相关文章所有 2 个版本

Kalmia: A heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers

Z Fu, J Ren, D Zhang, Y Zhou… - IEEE INFOCOM 2022 …, 2022 - ieeexplore.ieee.org

Motivated by the popularity of edge intelligence, DNN services have been widely deployed
at the edge, posing significant performance pressure on edge servers. How to improve the …

被引用次数：12 相关文章所有 2 个版本

Iris: interference and resource aware predictive orchestration for ml inference serving

A Ferikoglou, P Chrysomeris… - 2023 IEEE 16th …, 2023 - ieeexplore.ieee.org

Over the last years, the ever-growing number of Machine Learning (ML) and Artificial
Intelligence (AI) applications deployed in the Cloud has led to high demands on the …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

F Liang, Z Zhang, H Lu, V Leung, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rapid growth in the volume of data sets, models, and devices in the domain of deep
learning, there is increasing attention on large-scale distributed deep learning. In contrast to …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Perseus: Characterizing performance and cost of multi-tenant serving for cnn models

M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org

Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …

被引用次数：20 相关文章所有 7 个版本

高级搜索

QQ 群