Spotserve: Serving generative large language models on preemptible instances

[PDF][PDF] Specinfer: Accelerating generative llm serving with speculative inference and token tree verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - arXiv preprint arXiv …, 2023 - cs.cmu.edu

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them quickly and cheaply. This paper introduces …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：26 相关文章所有 2 个版本

[PDF] acm.org

Deep Learning Workload Scheduling in GPU Datacenters: A Survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：20 相关文章所有 3 个版本

[PDF] acm.org

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - Proceedings of the 29th …, 2024 - dl.acm.org

This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

被引用次数：5 相关文章

[PDF] arxiv.org

Toward sustainable genai using generation directives for carbon-friendly large language model inference

B Li, Y Jiang, V Gadepally, D Tiwari - arXiv preprint arXiv:2403.12900, 2024 - arxiv.org

The rapid advancement of Generative Artificial Intelligence (GenAI) across diverse sectors
raises significant environmental concerns, notably the carbon emissions from their cloud …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

被引用次数：4 相关文章所有 5 个版本

[PDF] usenix.org

Parcae: Proactive,{Liveput-Optimized}{DNN} Training on Preemptible Instances

J Duan, Z Song, X Miao, X Xi, D Lin, H Xu… - … USENIX Symposium on …, 2024 - usenix.org

Deep neural networks (DNNs) are becoming progressively large and costly to train. This
paper aims to reduce DNN training costs by leveraging preemptible instances on modern …

被引用次数：1 相关文章所有 2 个版本

[PDF] techrxiv.org

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org

To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群