particular, we consider the situation in which a team of agents collaborates to optimize a
common cost. The goal is to obtain factored policies that determine the individual behavior
of each agent so that the resulting joint policy is optimal. The main contribution of this work is
the introduction of Logical Team Q-learning (LTQL). LTQL does not rely on assumptions
about the environment and hence is generally applicable to any collaborative MARL …