In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. This problem is prevalent in many domains including air or ground traffic, multi-robot coordination, sensor networks, and smart power grids 1, 2. In this article we provide a general approach to coordinating learning agents and present examples from the multi-robot coordination domain 3–5. Many complex exploration domains (planetary exploration, search and rescue) require the use of autonomous robots. In addition, the use of multi-robot teams offers distinct advantages in efficiency and robustness over the use of a single robot. However, the potential gains also come at a cost: How to ensure that the robots do not work at cross-purposes and that the robots’ efforts support a common, system level objective. Directly extending single robot approaches to multi-robot systems presents difficulties in that the learning problem is no longer the same: the robots not only have to learn “good” actions, but actions that are complementary to one another in a constantly changing environment. Approaches that are particularly well suited to multi-robot systems include using Markov Decision Processes for online mechanism design 6, developing new reinforcement learning based algorithms 7–10, and domain based evolution 11. In addition, forming coalitions for purposes of reducing search costs 12, employing multilevel learning architectures for the formation of coalitions 13, and market based approaches 14 have been examined. Finally, in problems with limited or no communication, devising agent-specific objective functions that implicitly include coordination components has proven very successful 3, 4.
In this article, we summarize recent advances in developing such agent-specific objective functions. Given some system level objective function (eg, number of areas explored), we aim to derive an objective function for the agents in such a way that when they achieve their own objectives, the system objective is also achieved. For some system-level objective G (z), given as a function of the full system state z, consider the agent specific objective function for agent i: