coordinate to find an optimal joint policy that maximises joint value. Typical algorithms
exploit additive structure in the value function, but in the fully-observable multi-agent MDP
(MMDP) setting such structure is not present. We propose a new optimal solver for transition-
independent MMDPs, in which agents can only affect their own state but their reward
depends on joint transitions. We represent these dependencies compactly in conditional …