{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org

High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache …

被引用次数：793 相关文章所有 4 个版本

[PDF] acm.org

Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

被引用次数：57 相关文章所有 2 个版本

[PDF] arxiv.org

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

被引用次数：83 相关文章所有 7 个版本

[PDF] arxiv.org

Large language models understand and can be enhanced by emotional stimuli

C Li, J Wang, Y Zhang, K Zhu, W Hou, J Lian… - arXiv preprint arXiv …, 2023 - arxiv.org

Emotional intelligence significantly impacts our daily behaviors and interactions. Although
Large Language Models (LLMs) are increasingly viewed as a stride toward artificial general …

被引用次数：81 相关文章所有 4 个版本

[PDF] thecvf.com

Distrifusion: Distributed parallel inference for high-resolution diffusion models

M Li, T Cai, J Cao, Q Zhang, H Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models have achieved great success in synthesizing high-quality images.
However generating high-resolution images with diffusion models is still challenging due to …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

被引用次数：46 相关文章所有 2 个版本

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

被引用次数：32 相关文章所有 4 个版本

[PDF] arxiv.org

Infinite-llm: Efficient llm service for long context with distattention and distributed kvcache

B Lin, C Zhang, T Peng, H Zhao, W Xiao, M Sun… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid proliferation of Large Language Models (LLMs) has been a driving force in the
growth of cloud-based LLM services, which are now integral to advancing AI applications …

被引用次数：19 相关文章所有 2 个版本

[PDF] usenix.org

Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi… - … USENIX Symposium on …, 2024 - usenix.org

The increasing demand for improving deep learning model performance has led to a
paradigm shift in supporting low-precision computation to harness the robustness of deep …

被引用次数：3 相关文章

高级搜索

QQ 群