

Bandit Algorithms for Tree Search

14 years 12 days ago
Bandit Algorithms for Tree Search
Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to return rapidly a good value, and improve precision if more time is provided. The UCT algorithm [8], a tree search method based on Upper Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is “over-optimistic” in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially in the horizon depth is analyzed. We then consider Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce and analyze a Bandit Algorithm for Smooth Trees (BAST) which takes into account actual smoothness of the rewards for performing efficient “cuts” of sub-optima...
Pierre-Arnaud Coquelin, Rémi Munos
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2007
Where CORR
Authors Pierre-Arnaud Coquelin, Rémi Munos
Comments (0)