Tetris: Memory-efficient serverless inference through tensor sharing

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - arXiv preprint arXiv …, 2023 - arxiv.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

被引用次数：18 相关文章所有 3 个版本

A Survey on Scheduling Techniques in Computing and Network Convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

被引用次数：3 相关文章

[PDF] arxiv.org

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q Jin, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

被引用次数：5 相关文章所有 6 个版本

[PDF] github.io

AsyFunc: A high-performance and resource-efficient serverless inference system via asymmetric functions

Q Pei, Y Yuan, H Hu, Q Chen, F Liu - … of the 2023 ACM Symposium on …, 2023 - dl.acm.org

Recent advances in deep learning (DL) have spawned various intelligent cloud services
with well-trained DL models. Nevertheless, it is nontrivial to maintain the desired end-to-end …

被引用次数：8 相关文章所有 2 个版本

Memory orchestration mechanisms in serverless computing: a taxonomy, review and future directions

Z Shojaee rad, M Ghobaei-Arani, R Ahsan - Cluster Computing, 2024 - Springer

Serverless computing has become very popular in recent years due to its flexibility and cost
efficiency. Serverless computing is a cloud computing model that allows developers to write …

被引用次数：2 相关文章

[PDF] arxiv.org

FaST-GShare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference

J Gu, Y Zhu, P Wang, M Chadha, M Gerndt - Proceedings of the 52nd …, 2023 - dl.acm.org

Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference
due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms …

被引用次数：10 相关文章所有 3 个版本

[PDF] acm.org

Mira: A program-behavior-guided far memory system

Z Guo, Z He, Y Zhang - Proceedings of the 29th Symposium on …, 2023 - dl.acm.org

Far memory, where memory accesses are non-local, has become more popular in recent
years as a solution to expand memory size and avoid memory stranding. Prior far memory …

被引用次数：9 相关文章所有 3 个版本

[PDF] github.io

Optimus: Warming Serverless ML Inference via Inter-Function Model Transformation

Z Hong, J Lin, S Guo, S Luo, W Chen… - Proceedings of the …, 2024 - dl.acm.org

Serverless ML inference is an emerging cloud computing paradigm for low-cost, easy-to-
manage inference services. In serverless ML inference, each call is executed in a container; …

被引用次数：1 相关文章所有 4 个版本

[PDF] mdpi.com

Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution

N Wang, L Wang, X Li, X Qin - Electronics, 2023 - mdpi.com

Service mesh is gaining popularity as a microservice architecture paradigm due to its
lightness, transparency, and scalability. However, fully releasing configurations to the data …

被引用次数：1 相关文章所有 4 个版本

Rethinking Deployment for Serverless Functions: A Performance-First Perspective

Y Li, L Zhao, Y Yang, W Qu - … of the International Conference for High …, 2023 - dl.acm.org

Serverless computing commonly adopts strong isolation mechanisms for deploying
functions, which may bring significant performance overhead because each function needs …

高级搜索

QQ 群