diverse problems such as classification and decision making. Efficient support for DNNs on
CPUs, GPUs and accelerators has become a prolific area of research, resulting in a plethora
of techniques for energy-efficient DNN inference. However, previous proposals focus on a
single execution of a DNN. Popular applications, such as speech recognition or video
classification, require multiple back-to-back executions of a DNN to process a sequence of …