M Zhao, N Agarwal, A Basant, B Gedik, S Pan… - arXiv preprint arXiv …, 2021 - arxiv.org
Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators
(DSA) are used to train increasingly-complex deep learning models. These clusters rely on a …