Taming throughput-latency tradeoff in llm inference with sarathi-serve A Agrawal, N Kedia, A Panwar, J Mohan, N Kwatra, BS Gulavani, ... arXiv preprint arXiv:2403.02310, 2024 | 89 | 2024 |
Vidur: A Large-Scale Simulation Framework For LLM Inference A Agrawal, N Kedia, J Mohan, A Panwar, N Kwatra, B Gulavani, ... Proceedings of Machine Learning and Systems 6, 351-366, 2024 | 22 | 2024 |
Metron: Holistic performance evaluation framework for llm inference systems A Agrawal, A Agarwal, N Kedia, J Mohan, S Kundu, N Kwatra, R Ramjee, ... arXiv e-prints, arXiv: 2407.07000, 2024 | 4 | 2024 |
Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems A Agrawal, A Agarwal, N Kedia, J Mohan, S Kundu, N Kwatra, R Ramjee, ... arXiv preprint arXiv:2407.07000, 2024 | 1 | 2024 |