MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters Q Weng, W Xiao, Y Yu, W Wang, C Wang, J He, Y Li, L Zhang, W Lin, ... 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2022 | 190 | 2022 |
Internlm2 technical report Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ... arXiv preprint arXiv:2403.17297, 2024 | 48 | 2024 |
Metis: Learning to schedule long-running applications in shared container clusters at scale L Wang, Q Weng, W Wang, C Chen, B Li SC20: International Conference for High Performance Computing, Networking …, 2020 | 41 | 2020 |
Fast distributed deep learning via worker-adaptive batch sizing C Chen, Q Weng, W Wang, B Li, B Li Proceedings of the ACM symposium on cloud computing, 521-521, 2018 | 28 | 2018 |
Semi-dynamic load balancing: Efficient distributed learning in non-dedicated environments C Chen, Q Weng, W Wang, B Li, B Li Proceedings of the 11th ACM Symposium on Cloud Computing, 431-446, 2020 | 24 | 2020 |
Opus: Fair and efficient cache sharing for in-memory data analytics Y Yu, W Wang, J Zhang, Q Weng, KB Letaief 2018 IEEE 38th International Conference on Distributed Computing Systems …, 2018 | 14 | 2018 |
Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent Q Weng, L Yang, Y Yu, W Wang, X Tang, G Yang, L Zhang 2023 USENIX Annual Technical Conference (USENIX ATC 23), 995-1008, 2023 | 13 | 2023 |
Workload consolidation in alibaba clusters: the good, the bad, and the ugly Y Zhang, Y Yu, W Wang, Q Chen, J Wu, Z Zhang, J Zhong, T Ding, ... Proceedings of the 13th Symposium on Cloud Computing, 210-225, 2022 | 9 | 2022 |
Accelerating distributed learning in non-dedicated environments C Chen, Q Weng, W Wang, B Li, B Li IEEE Transactions on Cloud Computing 11 (1), 515-531, 2021 | 7 | 2021 |
Towards framework-independent, non-intrusive performance characterization for dataflow computation H Tian, Q Weng, W Wang Proceedings of the 10th ACM SIGOPS Asia-Pacific Workshop on Systems, 54-60, 2019 | 3 | 2019 |
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference S Li, H Lu, T Wu, M Yu, Q Weng, X Chen, Y Shan, B Yuan, W Wang arXiv preprint arXiv:2401.11240, 2024 | 2 | 2024 |