所有版本 - 学术资源搜索

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

Inference without interference: Disaggregate llm inference for mixed downstream workloads

C Hu, H Huang, L Xu, X Chen, J Xu, S Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformer-based large language model (LLM) inference serving is now the backbone of
many cloud services. LLM inference consists of a prefill phase and a decode phase …

被引用次数：35 相关文章

Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

C Hu, H Huang, L Xu, X Chen, J Xu, S Chen… - arXiv e …, 2024 - ui.adsabs.harvard.edu

Transformer-based large language model (LLM) inference serving is now the backbone of
many cloud services. LLM inference consists of a prefill phase and a decode phase …

高级搜索

QQ 群

Inference without interference: Disaggregate llm inference for mixed downstream workloads

Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

引用