Sarathi: Efficient llm inference by piggybacking decodes with chunked prefills

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

被引用次数：93 相关文章所有 2 个版本

[PDF] acm.org

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and LLM-based multimodal models, are revolutionizing the entire machine learning …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

被引用次数：51 相关文章所有 2 个版本

[PDF] usenix.org

{Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention}

B Gao, Z He, P Sharma, Q Kang, D Jevdjic… - 2024 USENIX Annual …, 2024 - usenix.org

Interacting with humans through multi-turn conversations is a fundamental feature of large
language models (LLMs). However, existing LLM serving engines executing multi-turn …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism

B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin - Proceedings of the ACM …, 2024 - dl.acm.org

The context window of large language models (LLMs) is rapidly increasing, leading to a
huge variance in resource usage between different requests as well as between different …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：69 相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Infinite-llm: Efficient llm service for long context with distattention and distributed kvcache

B Lin, C Zhang, T Peng, H Zhao, W Xiao, M Sun… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid proliferation of Large Language Models (LLMs) has been a driving force in the
growth of cloud-based LLM services, which are now integral to advancing AI applications …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

被引用次数：84 相关文章所有 3 个版本

[PDF] mit.edu

Salute the classic: Revisiting challenges of machine translation in the age of large language models

J Pang, F Ye, DF Wong, D Yu, S Shi, Z Tu… - Transactions of the …, 2025 - direct.mit.edu

Abstract The evolution of Neural Machine Translation (NMT) has been significantly
influenced by six core challenges (Koehn and Knowles,) that have acted as benchmarks for …

被引用次数：14 相关文章所有 3 个版本

高级搜索

QQ 群