Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

15 years 4 months ago

Download www.cs.cmu.edu

Real-time dynamic programming (RTDP) is a heuristic search algorithm for solving MDPs. We present a modified algorithm called Focused RTDP with several improvements. While RTDP maintains only an upper bound on the long-term reward function, FRTDP maintains two-sided bounds and bases the output policy on the lower bound. FRTDP guides search with a new rule for outcome selection, focusing on parts of the search graph that contribute most to uncertainty about the values of good policies. FRTDP has modified trial termination criteria that should allow it to solve some problems (within ) that RTDP cannot. Experiments show that for all the problems we studied, FRTDP significantly outperforms RTDP and LRTDP, and converges with up to six times fewer backups than the state-of-the-art HDP algorithm.

Trey Smith, Reid G. Simmons

Real-time Traffic

AAAI 2006 | Heuristic Search Algorithm | Intelligent Agents | Modified Algorithm | State-of-the-art Hdp Algorithm |

claim paper

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	AAAI
Authors	Trey Smith, Reid G. Simmons

Comments (0)

Sciweavers

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

AAAI 2006 | Heuristic Search Algorithm | Intelligent Agents | Modified Algorithm | State-of-the-art Hdp Algorithm |

Explore & Download

Productivity Tools

Sciweavers