Unicron: Economizing self-healing llm training at scale

T He, X Li, Z Wang, K Qian, J Xu, W Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Training large-scale language models is increasingly critical in various domains, but it is
hindered by frequent failures, leading to significant time and economic costs. Current failure …

Unicron: Economizing Self-Healing LLM Training at Scale

T He, X Li, Z Wang, K Qian, J Xu, W Yu… - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Training large-scale language models is increasingly critical in various domains, but it is
hindered by frequent failures, leading to significant time and economic costs. Current failure …