J Dong, B Luo, J Zhang, P Zhang, F Feng, Y Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel
training techniques, involving the deployment of thousands of GPUs to train a single model …