Sciweavers

ICML
2007
IEEE
15 years 11 days ago
Combining online and offline knowledge in UCT
The UCT algorithm learns a value function online using sample-based search. The TD() algorithm can learn a value function offline for the on-policy distribution. We consider three...
Sylvain Gelly, David Silver