Optimizing inference serving on serverless platforms

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

被引用次数：6 相关文章所有 2 个版本

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

FaST-GShare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference

J Gu, Y Zhu, P Wang, M Chadha, M Gerndt - Proceedings of the 52nd …, 2023 - dl.acm.org

Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference
due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms …

被引用次数：14 相关文章所有 3 个版本

[PDF] github.io

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org

Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

被引用次数：8 相关文章所有 4 个版本

[PDF] github.io

Optimus: Warming Serverless ML Inference via Inter-Function Model Transformation

Z Hong, J Lin, S Guo, S Luo, W Chen… - Proceedings of the …, 2024 - dl.acm.org

Serverless ML inference is an emerging cloud computing paradigm for low-cost, easy-to-
manage inference services. In serverless ML inference, each call is executed in a container; …

被引用次数：2 相关文章所有 5 个版本

[PDF] ieee.org

Autoscaling Solutions for Cloud Applications under Dynamic Workloads

G Quattrocchi, E Incerto, R Pinciroli… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Autoscaling systems provide means to automatically change the resources allocated to a
software system according to the incoming workload and its actual needs. Public cloud …

被引用次数：7 相关文章所有 9 个版本

[PDF] acm.org

In-storage domain-specific acceleration for serverless computing

R Mahapatra, S Ghodrati, BH Ahn, S Kinzer… - Proceedings of the 29th …, 2024 - dl.acm.org

While (I) serverless computing is emerging as a popular form of cloud execution,
datacenters are going through major changes:(II) storage dissaggregation in the system …

被引用次数：2 相关文章所有 2 个版本

[PDF] acm.org

Performance experiences from running an e-health inference process as faas across diverse clusters

G Kousiouris, A Pnevmatikakis - Companion of the 2023 ACM/SPEC …, 2023 - dl.acm.org

In this paper we report our experiences from the migration of an AI model inference process,
used in the context of an E-health platform to the Function as a Service model. To that …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

X Miao, G Oliaro, X Cheng, M Wu, C Unger… - arXiv preprint arXiv …, 2024 - arxiv.org

Parameter-efficient finetuning (PEFT) is a widely used technique to adapt large language
models for different tasks. Service providers typically create separate systems for users to …

被引用次数：3 相关文章所有 2 个版本

[PDF] mdpi.com

Optimizing Data Processing: A Comparative Study of Big Data Platforms in Edge, Fog, and Cloud Layers

T Shwe, M Aritsugi - Applied Sciences, 2024 - mdpi.com

Intelligent applications in several areas increasingly rely on big data solutions to improve
their efficiency, but the processing and management of big data incur high costs. Although …

被引用次数：5 相关文章所有 2 个版本

高级搜索

QQ 群