AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads W Gao, X Zhang, S Huang, S Guo, P Sun, Y Wen, T Zhang Proceedings of the 38th ACM International Conference on Supercomputing, 473-484, 2024 | 1 | 2024 |
Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters W Gao, W Zhuang, M Li, P Sun, Y Wen, T Zhang Proceedings of the 38th ACM International Conference on Supercomputing, 259-271, 2024 | | 2024 |
FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices H Wang, Y Jia, M Zhang, Q Hu, H Ren, P Sun, Y Wen, T Zhang Proceedings of the ACM on Web Conference 2024, 2902-2913, 2024 | | 2024 |
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 1 | 2024 |
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin arXiv preprint arXiv:2404.09526, 2024 | 1 | 2024 |
Internlm2 technical report Z Cai, M Cao, H Chen, K Chen, K Chen, X Chen, X Chen, Z Chen, Z Chen, ... arXiv preprint arXiv:2403.17297, 2024 | 22 | 2024 |
UniSched: A Unified Scheduler for Deep Learning Training Jobs with Different User Demands W Gao, Z Ye, P Sun, T Zhang, Y Wen IEEE Transactions on Computers, 2024 | 1 | 2024 |
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Z Xi, W Chen, B Hong, S Jin, R Zheng, W He, Y Ding, S Liu, X Guo, ... arXiv preprint arXiv:2402.05808, 2024 | 2 | 2024 |
Deep Learning Workload Scheduling in GPU Datacenters: A Survey Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo, T Zhang, Y Wen ACM Computing Surveys 56 (6), 1-38, 2024 | 3 | 2024 |
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding Q Chen, D Gu, G Wang, X Chen, YT Xiong, T Huang, Q Hu, X Jin, Y Wen, ... arXiv preprint arXiv:2401.09149, 2024 | 1 | 2024 |
Characterization of large language model development in the datacenter Q Hu, Z Ye, Z Wang, G Wang, M Zhang, Q Chen, P Sun, D Lin, X Wang, ... 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 8 | 2024 |
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning Q Chen, Q Hu, Z Ye, G Wang, P Sun, Y Wen, T Zhang arXiv preprint arXiv:2311.00257, 2023 | 1 | 2023 |
Boosting distributed full-graph gnn training with asynchronous one-bit communication M Zhang, Q Hu, P Sun, Y Wen, T Zhang arXiv preprint arXiv:2303.01277, 2023 | 6 | 2023 |
Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs Q Hu, M Zhang, P Sun, Y Wen, T Zhang Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 12 | 2023 |
Hydro:{Surrogate-Based} Hyperparameter Tuning Service in Datacenters Q Hu, Z Ye, M Zhang, Q Chen, P Sun, Y Wen, T Zhang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 4 | 2023 |
Titan: a scheduler for foundation model fine-tuning workloads W Gao, P Sun, Y Wen, T Zhang Proceedings of the 13th Symposium on Cloud Computing, 348-354, 2022 | 4 | 2022 |
Streaming analytics using a serverless compute system V Sood, J Liu, M Saroop, TM Ghadge, H Sharma, N Vommi, TC Olychuck, ... US Patent 11,388,210, 2022 | 1 | 2022 |
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo, T Zhang, Y Wen arXiv preprint arXiv:2205.11913, 2022 | 23 | 2022 |
Establishment and characterization of human induced pluripotent stem cell line from a Parkinson’s disease patient harboring VPS13A gene mutation X Lu, W Wang, Y Liu, N Song, M Li, X Mu, N Zhang, Q Chen, L Jiang, ... Stem Cell Research 60, 102685, 2022 | 1 | 2022 |
A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs R Liang, B He, S Yan, P Sun arXiv preprint arXiv:2201.03175, 2022 | | 2022 |