We show that in a cooperative $ N $-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) …
We focus on multi-agent reinforcement learning in tabular average-cost settings: a team of agents sequentially interacts with the environment and observes localized incentives. The …
RV Dwaraknath, L Ying - OPT 2023: Optimization for Machine Learning - openreview.net
In this paper, we extend the connection between linear programming formulations of MDPs and policy gradient methods for infinite horizon MDPs presented in (Ying, L., & Zhu, Y …