Fitted Q-iteration by Advantage Weighted Regression

15 years 8 months ago

Download www.kyb.mpg.de

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantageweighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

Gerhard Neumann, Jan Peters

Real-time Traffic

Action Spaces | Continuous Actions | Information Technology | NIPS 2008 | Policy Improvement Step |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	NIPS
Authors	Gerhard Neumann, Jan Peters

Comments (0)

Sciweavers

Fitted Q-iteration by Advantage Weighted Regression

Action Spaces | Continuous Actions | Information Technology | NIPS 2008 | Policy Improvement Step |

Explore & Download

Productivity Tools

Sciweavers