Learned Best-Effort LLM Serving

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

在引用文章中搜索

[PDF] arxiv.org

Intelligent router for llm workloads: Improving performance through workload-aware scheduling

K Jain, A Parayil, A Mallick, E Choukse, X Qin… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Model (LLM) workloads have distinct prefill and decode phases with
different compute and memory requirements which should ideally be accounted for when …

被引用次数：2 相关文章所有 4 个版本

TSM-LLM: Task Scheduling Management System for Large Language Models

Z Wen, G Zhu, Y Wang, H Luo… - 2024 5th International …, 2024 - ieeexplore.ieee.org

As large model services gain popularity; the high cost of deployment presents a challenge.
Thus, focusing on improving model efficiency in high-load scenarios while managing the …

高级搜索

QQ 群

Learned Best-Effort LLM Serving

Intelligent router for llm workloads: Improving performance through workload-aware scheduling

TSM-LLM: Task Scheduling Management System for Large Language Models

引用