W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
As machine learning (ML) techniques are applied to a widening range of applications, high throughput ML inference serving has become critical for online services. Such ML inference …
Y Wang, K Chen, H Tan, K Guo - Proceedings of the Eighteenth …, 2023 - dl.acm.org
Today's trend of building ever larger language models (LLMs), while pushing the performance of natural language processing, adds significant latency to the inference stage …
Generating texts with a large language model (LLM) consumes massive amounts of memory. Apart from the already-large model parameters, the key/value (KV) cache that …
The availability of vast video collections and the accuracy of ML models has generated significant interest in video analytics systems. Since naively processing all frames using …
S Deng, H Zhao, B Huang, C Zhang… - Proceedings of the …, 2024 - ieeexplore.ieee.org
The development of cloud computing delivery models inspires the emergence of cloud- native computing. Cloud-native computing, as the most influential development principle for …
K Zhao, Z Zhou, X Chen, R Zhou… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
The accelerating convergence of artificial intelligence and edge computing has sparked a recent wave of interest in edge intelligence. While pilot efforts focused on edge DNN …
Z Wen, Y Wang, F Liu - IEEE INFOCOM 2022-IEEE Conference …, 2022 - ieeexplore.ieee.org
Function-as-a-Service (FaaS) offers a fine-grained resource provision model, enabling developers to build highly elastic cloud applications. User requests are handled by a series …
High-definition live streaming has experienced tremendous growth. However, the video quality of live video is often limited by the streamer's uplink bandwidth. Recently, neural …