Ribbon: cost-effective and qos-aware deep learning model inference using a diverse pool of...

B Li, S Samsi, V Gadepally, D Tiwari - Proceedings of the International …, 2023 - dl.acm.org

This paper presents a solution to the challenge of mitigating carbon emissions from hosting
large-scale machine learning (ML) inference services. ML inference is critical to modern …

被引用次数：12 相关文章所有 6 个版本

[PDF] acm.org

Kairos: Building cost-efficient machine learning inference systems with heterogeneous cloud resources

B Li, S Samsi, V Gadepally, D Tiwari - Proceedings of the 32nd …, 2023 - dl.acm.org

Online inference is becoming a key service product for many businesses, deployed in cloud
platforms to meet customer demands. Despite their revenue-generation capability, these …

被引用次数：9 相关文章所有 4 个版本

[PDF] u-aizu.ac.jp

Tcb: Accelerating transformer inference services with request concatenation

B Fu, F Chen, P Li, D Zeng - … of the 51st International Conference on …, 2022 - dl.acm.org

Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …

被引用次数：11 相关文章所有 3 个版本

[PDF] google.com

Characterizing multi-instance gpu for machine learning workloads

B Li, V Gadepally, S Samsi… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

As machine learning (ML) becomes more and more popular, datacenter operators use
hardware accelerators such as GPUs to tackle the high computation demand of ML …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Smlt: A serverless framework for scalable and adaptive machine learning design and training

A Ali, S Zawad, P Aditya, IE Akkus, R Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

In today's production machine learning (ML) systems, models are continuously trained,
improved, and deployed. ML design and training are becoming a continuous workflow of …

被引用次数：7 相关文章所有 2 个版本

Graph3PO: A Temporal Graph Data Processing Method for Latency QoS Guarantee in Object Cloud Storage System

W Zhang, Z Shi, Z Liao, Y Li, Y Du, Y Wu… - Proceedings of the …, 2023 - dl.acm.org

Object cloud storage systems are deployed with diverse applications that have varying
latency service level objectives (SLOs), posting challenges for supporting quality of service …

Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing

S Cai, Z Zhou, K Zhao, X Chen - Proceedings of the 14th ACM SIGOPS …, 2023 - dl.acm.org

With the emerging of machine learning, many commercial companies increasingly utilize
machine learning inference systems as backend services to improve their products …

被引用次数：1 相关文章

Dash: Scheduling deep learning workloads on multi-generational gpu-accelerated clusters

B Li, T Patel, V Gadepally, K Gettings… - 2022 IEEE High …, 2022 - ieeexplore.ieee.org

Two notable characteristics of modern GPU-accelerated HPC clusters are:(1) they
increasingly run deep learning (DL) model-training workloads, and (2) they consist of …

被引用次数：1 相关文章

[PDF] arxiv.org

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

X Hui, Y Xu, Z Guo, X Shen - arXiv preprint arXiv:2404.16812, 2024 - arxiv.org

Recent years have witnessed increasing interest in machine learning inferences on
serverless computing for its auto-scaling and cost effective properties. Existing serverless …

mSIRM: Cost-Efficient and SLO-aware ML Load Balancing on Fog and Multi-Cloud Network

C Phalak, D Chahal, M Ramesh… - … of the 13th Workshop on AI …, 2023 - dl.acm.org

The use of intelligent sensors and edge devices has grown exponentially for automation in
the industry to hyper-personalize applications, minimize cost, improve efficiency, and …

被引用次数：1 相关文章

高级搜索

QQ 群