The UCT algorithm learns a value function online using sample-based search. The TD() algorithm can learn a value function offline for the on-policy distribution. We consider three...
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are...
Brian Kulis, Sugato Basu, Inderjit S. Dhillon, Ray...
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maxi...
In this paper, we propose a practical scheme, called Non-Binary Joint Network-Channel Decoding (NB-JNCD) for reliable communication in wireless networks. It seamlessly couples cha...
Zheng Guo, Jie Huang, Bing Wang, Jun-Hong Cui, She...
We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. ...