objective function is composed of a sum of convex and Lipschitz functions taking their values
on a space of state-action distributions. We introduce an iterative algorithm which we prove
to be a Mirror Descent associated with a non-standard Bregman divergence, having a
convergence rate of order 1/