many benefits in a broad variety of domains where performance is critical. This technique is
a core component of most major numerical libraries (TensorFlow, PyTorch, Theano,
MXNet,...) and is successfully used to speed up and optimise many computationally-
intensive tasks. However, different design choices in each of these libraries lead to
noticeable differences in efficiency and in the way an end user writes efficient code. In this …