T He, X Li, Z Wang, K Qian,
J Xu,
W Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Training large-scale language models is increasingly critical in various domains, but it is
hindered by frequent failures, leading to significant time and economic costs. Current failure …