Fractional Moments on Bandit Problems

12 years 7 months ago

Download www.cse.iitm.ac.in

Reinforcement learning addresses the dilemma between exploration to ﬁnd profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments that represent this explore/exploit situation. We propose a learning algorithm for bandit problems based on fractional expectation of rewards acquired. The algorithm is theoretically shown to converge on an -optimal arm and achieve O(n) sample complexity. Experimental results show the algorithm incurs substantially lower regrets than parameter-optimized -greedy and SoftMax approaches and other low sample complexity state-of-the-art techniques.

Ananda Narayanan B., Balaraman Ravindran

Real-time Traffic

Art Techniques | Bandit Problems | CORR 2012 | Education | Reinforcement Learning |

claim paper

Post Info
More Details (n/a)

Added	20 Apr 2012
Updated	20 Apr 2012
Type	Journal
Year	2012
Where	CORR
Authors	Ananda Narayanan B., Balaraman Ravindran

Comments (0)

Sciweavers

Fractional Moments on Bandit Problems

Art Techniques | Bandit Problems | CORR 2012 | Education | Reinforcement Learning |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers