Astraea: A fair deep learning scheduler for multi-tenant gpu clusters

Z Ye, P Sun, W Gao, T Zhang, X Wang… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Modern GPU clusters are designed to support distributed Deep Learning jobs from multiple
tenants concurrently. Each tenant may have varied and dynamic resource demands …