Biasing Approximate Dynamic Programming with a Lower Discount Factor

14 years 2 months ago

Download hal.inria.fr

Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. It is generally assumed that using an artificially low discount factor will improve the convergence rate, while sacrificing the solution quality. We however demonstrate that using an artificially low discount factor may significantly improve the solution quality, when used in approximate dynamic programming. We propose two explanations of this phenomenon. The first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds. However, we also show that these bounds are loose, thus their decrease does not entirely justify the improved solution quality. We thus propose another justification: when the rewards are received only sporadically (as in the case of Tetris), we can derive tighter bounds, which support a significant improvement in the solution quality with a decreased discount fa...

Marek Petrik, Bruno Scherrer

Real-time Traffic

Discount Factor | Information Technology | Low Discount Factor | NIPS 2008 | Solution Quality |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	NIPS
Authors	Marek Petrik, Bruno Scherrer

Comments (0)

Sciweavers

Biasing Approximate Dynamic Programming with a Lower Discount Factor

Discount Factor | Information Technology | Low Discount Factor | NIPS 2008 | Solution Quality |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers