Sarathi: Efficient llm inference by piggybacking decodes with chunked prefills A Agrawal, A Panwar, J Mohan, N Kwatra, BS Gulavani, R Ramjee arXiv preprint arXiv:2308.16369, 2023 | 43 | 2023 |
Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads D Shukla, M Sivathanu, S Viswanatha, B Gulavani, R Nehme, A Agrawal, ... arXiv preprint arXiv:2202.07848, 2022 | 30 | 2022 |
Logan: A distributed online log parser A Agrawal, R Karlupia, R Gupta 2019 IEEE 35th International Conference on Data Engineering (ICDE), 1946-1951, 2019 | 30 | 2019 |
Taming {Throughput-Latency} Tradeoff in {LLM} Inference with {Sarathi-Serve} A Agrawal, N Kedia, A Panwar, J Mohan, N Kwatra, B Gulavani, ... 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 22 | 2024 |
Delog: A high-performance privacy preserving log filtering framework A Agrawal, A Dixit, NA Shettar, D Kapadia, V Agrawal, R Gupta, ... 2019 IEEE International Conference on Big Data (Big Data), 1739-1748, 2019 | 7* | 2019 |
Vidur: A Large-Scale Simulation Framework For LLM Inference A Agrawal, N Kedia, J Mohan, A Panwar, N Kwatra, B Gulavani, ... Proceedings of Machine Learning and Systems 6, 351-366, 2024 | 4 | 2024 |
Dynaquant: Compressing deep learning training checkpoints via dynamic quantization A Agrawal, S Reddy, S Bhattamishra, VPS Nookala, V Vashishth, K Rong, ... arXiv preprint arXiv:2306.11800, 2023 | 2 | 2023 |
Metron: Holistic performance evaluation framework for llm inference systems A Agrawal, A Agarwal, N Kedia, J Mohan, S Kundu, N Kwatra, R Ramjee, ... arXiv preprint arXiv:2407.07000, 2024 | 1 | 2024 |
Elastically managing workers of multi-worker workloads on accelerator devices M Sivathanu, S Viswanatha, B Gulavani, DK Shukla, RV Nehme, ... US Patent App. 17/855,722, 2023 | | 2023 |
Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks A Agrawal, R Karlupia arXiv preprint arXiv:1909.00052, 2019 | | 2019 |