J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - Proceedings of the 16th …, 2019 - dl.acm.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …