Intelligent router for llm workloads: Improving performance through workload-aware scheduling

K Jain, A Parayil, A Mallick, E Choukse, X Qin… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Model (LLM) workloads have distinct prefill and decode phases with
different compute and memory requirements which should ideally be accounted for when …

TSM-LLM: Task Scheduling Management System for Large Language Models

Z Wen, G Zhu, Y Wang, H Luo… - 2024 5th International …, 2024 - ieeexplore.ieee.org
As large model services gain popularity; the high cost of deployment presents a challenge.
Thus, focusing on improving model efficiency in high-load scenarios while managing the …