Q-Decomposition for Reinforcement Learning Agents

15 years 4 months ago

Download www.hpl.hp.com

The paper explores a very simple agent design method called Q-decomposition, wherein a complex agent is built from simpler subagents. Each subagent has its own reward function and runs its own reinforcement learning process. It supplies to a central arbitrator the Q-values (according to its own reward function) for each possible action. The arbitrator selects an action maximizing the sum of Q-values from all the subagents. This approach has advantages over designs in which subagents recommend actions. It also has the property that if each subagent runs the Sarsa reinforcement learning algorithm to learn its local Q-function, then a globally optimal policy is achieved. (On the other hand, local Q-learning leads to globally suboptimal behavior.) In some cases, this form of agent decomposition allows the local Q-functions to be expressed by muchreduced state and action spaces. These results are illustrated in two domains that require effective coordination of behaviors.

Stuart J. Russell, Andrew Zimdars

Real-time Traffic

ICML 2003 | Machine Learning | Reinforcement Learning Algorithm | Reinforcement Learning Process | Reward Function |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2003
Where	ICML
Authors	Stuart J. Russell, Andrew Zimdars

Comments (0)

Sciweavers

Q-Decomposition for Reinforcement Learning Agents

ICML 2003 | Machine Learning | Reinforcement Learning Algorithm | Reinforcement Learning Process | Reward Function |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers