T He, X Li, Z Wang, K Qian, J Xu, W Yu… - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Training large-scale language models is increasingly critical in various domains, but it is
hindered by frequent failures, leading to significant time and economic costs. Current failure …