关注
Kan Zhu
Kan Zhu
在 cs.washington.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Atom: Low-bit quantization for efficient and accurate llm serving
Y Zhao, CY Lin, K Zhu, Z Ye, L Chen, S Zheng, L Ceze, A Krishnamurthy, ...
Proceedings of Machine Learning and Systems 6, 196-209, 2024
832024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
J Tang, Y Zhao, K Zhu, G Xiao, B Kasikci, S Han
arXiv preprint arXiv:2406.10774, 2024
262024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
K Kamahori, Y Gu, K Zhu, B Kasikci
arXiv preprint arXiv:2402.07033, 2024
102024
Nanoflow: Towards optimal large language model serving throughput
K Zhu, Y Zhao, L Zhao, G Zuo, Y Gu, D Xie, Y Gao, Q Xu, T Tang, Z Ye, ...
arXiv preprint arXiv:2408.12757, 2024
92024
Atom: Low-bit quantization for efficient and accurate llm serving, 2024
Y Zhao, CY Lin, K Zhu, Z Ye, L Chen, S Zheng, L Ceze, A Krishnamurthy, ...
URL https://arxiv. org/abs/2310.19102, 0
7
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Y Zhao, S Yang, K Zhu, L Zheng, B Kasikci, Y Zhou, J Xing, I Stoica
arXiv preprint arXiv:2411.16102, 2024
12024
Can Storage Devices be Power Adaptive?
D Xie, T Stavrinos, K Zhu, S Peter, B Kasikci, T Anderson
Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File …, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–7