相关文章- 学术资源搜索

Nanily: A qos-aware scheduling for dnn inference workload in clouds

X Tang, P Wang, Q Liu, W Wang… - 2019 IEEE 21st …, 2019 - ieeexplore.ieee.org

DNN inferences are widely emerging as a service and must run in sub-second latency,
which need GPU hardware to achieve parallel accelerating. Prior works to improve the …

被引用次数：21 相关文章所有 2 个版本

Jily: Cost-aware AutoScaling of heterogeneous GPU for DNN inference in public cloud

Z Wang, X Tang, Q Liu, J Han - 2019 IEEE 38th International …, 2019 - ieeexplore.ieee.org

Recently, a large number of DNN inference services have emerged in public clouds, making
the low-cost deployment of DNN inference services a hot research topic. Previous studies …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

F Xu, J Xu, J Chen, L Chen, R Shang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

GPUs are essential to accelerating the latency-sensitive deep neural network (DNN)
inference workloads in cloud datacenters. To fully utilize GPU resources, spatial sharing of …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlsys.org

Sla-driven ml inference framework for clouds with heterogeneous accelerators

J Cho, D Zad Tootaghaj, L Cao… - … of Machine Learning …, 2022 - proceedings.mlsys.org

The current design of Serverless computing frameworks assumes that all the requests and
underlying compute hardware are homogeneous. This homogeneity assumption causes two …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

A Dhakal, SG Kulkarni, KK Ramakrishnan - arXiv preprint arXiv …, 2023 - arxiv.org

Hardware accelerators such as GPUs are required for real-time, low-latency inference with
Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Automated runtime-aware scheduling for multi-tenant dnn inference on gpu

F Yu, S Bray, D Wang, L Shangguan… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org

With the fast development of deep neural networks (DNNs), many real-world applications
are adopting multiple models to conduct compound tasks, such as co-running classification …

被引用次数：39 相关文章所有 7 个版本

[PDF] sjtu.edu.cn

Ebird: Elastic batch for improving responsiveness and throughput of deep learning services

W Cui, M Wei, Q Chen, X Tang, J Leng… - 2019 IEEE 37th …, 2019 - ieeexplore.ieee.org

GPUs have been widely adopted to serve online deep learning-based services that have
stringent QoS requirements. However, emerging deep learning serving systems often result …

被引用次数：28 相关文章所有 5 个版本

[PDF] github.io

CODA: Improving resource utilization by slimming and co-locating DNN and CPU jobs

H Zhao, W Cui, Q Chen, J Leng, K Yu… - 2020 IEEE 40th …, 2020 - ieeexplore.ieee.org

While deep neural network (DNN) models are often trained on GPUs, many companies and
research institutes build GPU clusters that are shared by different groups. On such GPU …

被引用次数：14 相关文章所有 6 个版本

[PDF] acm.org

Qos-aware scheduling of heterogeneous servers for inference in deep neural networks

Z Fang, T Yu, OJ Mengshoel, RK Gupta - Proceedings of the 2017 ACM …, 2017 - dl.acm.org

Deep neural networks (DNNs) are popular in diverse fields such as computer vision and
natural language processing. DNN inference tasks are emerging as a service provided by …

被引用次数：38 相关文章所有 6 个版本

S^ 3dnn: Supervised streaming and scheduling for gpu-accelerated real-time dnn workloads

H Zhou, S Bateni, C Liu - 2018 IEEE Real-Time and Embedded …, 2018 - ieeexplore.ieee.org

Deep Neural Networks (DNNs) are being widely applied in many advanced embedded
systems that require autonomous decision making, eg, autonomous driving and robotics. To …

被引用次数：98 相关文章所有 3 个版本

高级搜索

QQ 群

Nanily: A qos-aware scheduling for dnn inference workload in clouds

Jily: Cost-aware AutoScaling of heterogeneous GPU for DNN inference in public cloud

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud

Sla-driven ml inference framework for clouds with heterogeneous accelerators

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

Automated runtime-aware scheduling for multi-tenant dnn inference on gpu

Ebird: Elastic batch for improving responsiveness and throughput of deep learning services

CODA: Improving resource utilization by slimming and co-locating DNN and CPU jobs

Qos-aware scheduling of heterogeneous servers for inference in deep neural networks

S^ 3dnn: Supervised streaming and scheduling for gpu-accelerated real-time dnn workloads

相关搜索

引用