| |
Abstract:
We present a principled and efficient planning algorithm for
cooperative multiagent dynamic systems. A striking feature of our
method is that the coordination and communication between the
agents is not imposed, but derived directly from the system
dynamics and function approximation architecture. We view the
entire multiagent system as a single, large Markov decision
process (MDP), which we assume can be represented in a factored
way using a dynamic Bayesian network (DBN). The action space of
the resulting MDP is the joint action space ofthe entire set of
agents. Our approach is based on the use of factored linear value
functions as an approximation to the joint value function. This
factorization of the value function allows the agents to
coordinate their actions at runtime using a natural message
passing scheme. We provide a simple and efficient method for
computing such an approximate value function by solving a single
linear program, whose size is determined by the interaction
between the value function structure and the DBN. We thereby
avoid the exponential blowup in the state and action space. We
show that our approach compares favorably with approaches based
on reward sharing. We also show that our algorithm is an
efficient alternative to more complicated algorithms even in the
single agent case.
|