X Tang, P Wang, Q Liu, W Wang, J Han - 2019 IEEE 21st International …, 2019 - computer.org
DNN inferences are widely emerging as a service and must run in sub-second latency,
which need GPU hardware to achieve parallel accelerating. Prior works to improve the …