Sciweavers

NIPS
2000

Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

14 years 1 months ago
Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task
The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-learning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.
Brian Sallans, Geoffrey E. Hinton
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where NIPS
Authors Brian Sallans, Geoffrey E. Hinton
Comments (0)