Tiresias: A {GPU} cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - … USENIX Symposium on …, 2019 - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

Tiresias: a GPU cluster manager for distributed deep learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon… - Proceedings of the 16th …, 2019 - dl.acm.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - eecs.umich.edu
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - usenix.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - mosharaf.com
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - usenix.net
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - rtcl.eecs.umich.edu
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - gujuncheng.info
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - scholar.archive.org
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …

[PDF][PDF] Tiresias: A GPU Cluster Manager for Distributed Deep Learning

J Gu, M Chowdhury, KG Shin, Y Zhu, M Jeon, J Qian… - sands.kaust.edu.sa
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …