Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

14 years 2 months ago

Download members.chello.at

The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-learning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.

Brian Sallans, Geoffrey E. Hinton

Real-time Traffic

Consecutive State-action Pairs | Large Factored Markov | NIPS 2000 | NIPS 2007 | State-action Pairs |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	NIPS
Authors	Brian Sallans, Geoffrey E. Hinton

Comments (0)

Sciweavers

Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

Consecutive State-action Pairs | Large Factored Markov | NIPS 2000 | NIPS 2007 | State-action Pairs |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers