Enabling Parallelism Hot Switching for Efficient Training of Large Language Models H Ge, F Fu, H Li, X Wang, S Lin, Y Wang, X Nie, H Zhang, X Miao, B Cui Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024 | 1 | 2024 |
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization H Li, F Fu, H Ge, S Lin, X Wang, J Niu, Y Wang, H Zhang, X Nie, B Cui arXiv preprint arXiv:2410.13333, 2024 | | 2024 |