Combining online and offline knowledge in UCT

16 years 7 months ago

Download www.machinelearning.org

The UCT algorithm learns a value function online using sample-based search. The TD() algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 ? 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 ? 9 Go program. Each technique significantly improves MoGo'...

Sylvain Gelly, David Silver

Real-time Traffic

ICML 2007 | Machine Learning | Offline Value Function | UCT Algorithm | UCT Value Function |

claim paper

» Making Better Recommendations with Online Profiling Agents

» Online Planning and Scheduling An Application to Controlling Modular Printers

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2007
Where	ICML
Authors	Sylvain Gelly, David Silver

Comments (0)

Sciweavers

Combining online and offline knowledge in UCT

ICML 2007 | Machine Learning | Offline Value Function | UCT Algorithm | UCT Value Function |

Explore & Download

Productivity Tools

Sciweavers