A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org
The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

FaST-GShare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference

J Gu, Y Zhu, P Wang, M Chadha, M Gerndt - Proceedings of the 52nd …, 2023 - dl.acm.org
Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference
due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms …

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org
Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

Optimus: Warming Serverless ML Inference via Inter-Function Model Transformation

Z Hong, J Lin, S Guo, S Luo, W Chen… - Proceedings of the …, 2024 - dl.acm.org
Serverless ML inference is an emerging cloud computing paradigm for low-cost, easy-to-
manage inference services. In serverless ML inference, each call is executed in a container; …

Autoscaling Solutions for Cloud Applications under Dynamic Workloads

G Quattrocchi, E Incerto, R Pinciroli… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Autoscaling systems provide means to automatically change the resources allocated to a
software system according to the incoming workload and its actual needs. Public cloud …

In-storage domain-specific acceleration for serverless computing

R Mahapatra, S Ghodrati, BH Ahn, S Kinzer… - Proceedings of the 29th …, 2024 - dl.acm.org
While (I) serverless computing is emerging as a popular form of cloud execution,
datacenters are going through major changes:(II) storage dissaggregation in the system …

Performance experiences from running an e-health inference process as faas across diverse clusters

G Kousiouris, A Pnevmatikakis - Companion of the 2023 ACM/SPEC …, 2023 - dl.acm.org
In this paper we report our experiences from the migration of an AI model inference process,
used in the context of an E-health platform to the Function as a Service model. To that …

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

X Miao, G Oliaro, X Cheng, M Wu, C Unger… - arXiv preprint arXiv …, 2024 - arxiv.org
Parameter-efficient finetuning (PEFT) is a widely used technique to adapt large language
models for different tasks. Service providers typically create separate systems for users to …

Optimizing Data Processing: A Comparative Study of Big Data Platforms in Edge, Fog, and Cloud Layers

T Shwe, M Aritsugi - Applied Sciences, 2024 - mdpi.com
Intelligent applications in several areas increasingly rely on big data solutions to improve
their efficiency, but the processing and management of big data incur high costs. Although …