where nodes are active sporadically. Each active node should properly control its action to
maximize the global performance of the network, which is characterized by a pre-defined
utility function. We consider a challenging situation where the optimization algorithm has to
be performed only based on a scalar approximation of the utility function, rather than its
closed-form expression, so that the typical gradient descent method cannot be applied. This …