Nanily: A qos-aware scheduling for dnn inference workload in clouds

X Tang, P Wang, Q Liu, W Wang… - 2019 IEEE 21st …, 2019 - ieeexplore.ieee.org
DNN inferences are widely emerging as a service and must run in sub-second latency,
which need GPU hardware to achieve parallel accelerating. Prior works to improve the …

Nanily: A QoS-Aware Scheduling for DNN Inference Workload in Clouds

X Tang, P Wang, Q Liu, W Wang, J Han - 2019 IEEE 21st International …, 2019 - computer.org
DNN inferences are widely emerging as a service and must run in sub-second latency,
which need GPU hardware to achieve parallel accelerating. Prior works to improve the …