Automated runtime-aware scheduling for multi-tenant dnn inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, X Tang… - arXiv preprint arXiv …, 2021 - arxiv.org

Deep Learning (DL) models have achieved superior performance in many application
domains, including vision, language, medical, commercial ads, entertainment, etc. With the …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of multi-tenant deep learning inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, C Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …

被引用次数：35 相关文章所有 5 个版本

[PDF] arxiv.org

Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms

Y Xue, Y Liu, L Nai, J Huang - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Cloud platforms today have been deploying hardware accelerators like neural processing
units (NPUs) for powering machine learning (ML) inference services. To maximize the …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Miriam: Exploiting elastic kernels for real-time multi-dnn inference on edge gpu

Z Zhao, N Ling, N Guan, G Xing - … of the 21st ACM Conference on …, 2023 - dl.acm.org

Many applications such as autonomous driving and augmented reality, require the
concurrent running of multiple deep neural networks (DNN) that poses different levels of real …

被引用次数：16 相关文章所有 4 个版本

[PDF] acm.org

GMorph: Accelerating Multi-DNN Inference via Model Fusion

Q Yang, T Yang, M Xiang, L Zhang, H Wang… - Proceedings of the …, 2024 - dl.acm.org

AI-powered applications often involve multiple deep neural network (DNN)-based prediction
tasks to support application-level functionalities. However, executing multi-DNNs can be …

被引用次数：3 相关文章所有 2 个版本

[PDF] acm.org

AccuMO: Accuracy-centric multitask offloading in edge-assisted mobile augmented reality

ZJ Kong, Q Xu, J Meng, YC Hu - Proceedings of the 29th Annual …, 2023 - dl.acm.org

Immersive applications such as Augmented Reality (AR) and Mixed Reality (MR) often need
to perform multiple latency-critical tasks on every frame captured by the camera, which all …

被引用次数：9 相关文章所有 3 个版本

Rosgm: A real-time gpu management framework with plug-in policies for ros 2

R Li, T Hu, X Jiang, L Li, W Xing… - 2023 IEEE 29th Real …, 2023 - ieeexplore.ieee.org

Robot Operating System (ROS) is a prevailing software framework for robotic appliscation
development. Graphics Processing Unit (GPU) is widely used in many ROS applications as …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs

A Chen, F Xu, L Han, Y Dong, L Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

GPUs have become the defacto hardware devices for accelerating Deep Neural Network
(DNN) inference workloads. However, the conventional sequential execution mode of DNN …

被引用次数：3 相关文章所有 2 个版本

Jigsaw: Taming bev-centric perception on dual-soc for autonomous driving

L Sun, C Li, X Hou, T Huang, C Xu… - 2024 IEEE Real …, 2024 - ieeexplore.ieee.org

Real-time perception is important for autonomous driving. We observe an emerging trend
using one large and critical fusion-based Bird's-Eye-View (BEV) Deep Neural Network …

被引用次数：1 相关文章所有 2 个版本

Boosting dnn cold inference on edge devices

R Yi, T Cao, A Zhou, X Ma, S Wang, M Xu - Proceedings of the 21st …, 2023 - dl.acm.org

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use
cases, it's not likely to pack all DNNs into device memory and expect that each inference has …

被引用次数：7 相关文章

高级搜索

QQ 群