Sciweavers

272 search results - page 29 / 55
» Parallel Reinforcement Learning with Linear Function Approxi...
Sort
View
ICML
2009
IEEE
14 years 2 months ago
Learning linear dynamical systems without sequence information
Virtually all methods of learning dynamic systems from data start from the same basic assumption: that the learning algorithm will be provided with a sequence, or trajectory, of d...
Tzu-Kuo Huang, Jeff Schneider

Publication
222views
14 years 4 months ago
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
Christos Dimitrakakis, Michail G. Lagoudakis
ML
2002
ACM
154views Machine Learning» more  ML 2002»
13 years 7 months ago
Technical Update: Least-Squares Temporal Difference Learning
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
Justin A. Boyan
IJCNN
2006
IEEE
14 years 1 months ago
Learning to Rank by Maximizing AUC with Linear Programming
— Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approxi...
Kaan Ataman, W. Nick Street, Yi Zhang
GECCO
2009
Springer
200views Optimization» more  GECCO 2009»
14 years 2 months ago
Apply ant colony optimization to Tetris
Tetris is a falling block game where the player’s objective is to arrange a sequence of different shaped tetrominoes smoothly in order to survive. In the intelligence games, ag...
Xingguo Chen, Hao Wang, Weiwei Wang, Yinghuan Shi,...