Sciweavers

109 search results - page 14 / 22
» Policy teaching through reward function learning
Sort
View

Publication
334views
14 years 4 months ago
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...
Christos Dimitrakakis, Michail G. Lagoudakis
CDC
2008
IEEE
197views Control Systems» more  CDC 2008»
14 years 2 months ago
Dynamic spectrum access policies for cognitive radio
—We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooper...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
HICSS
2003
IEEE
142views Biometrics» more  HICSS 2003»
14 years 27 days ago
Evolution of a Knowledge Focused Computer Supported Learning System by Ensuring Extensibility through Generalization and Replica
If sufficient attention is not paid to the information models on which Learning Platforms are based the ability to deliver rich functionality is hindered. This paper describes the...
David White, Lesley A. Gardner, Don Sheridan
ATAL
2004
Springer
14 years 1 months ago
Communication for Improving Policy Computation in Distributed POMDPs
Distributed Partially Observable Markov Decision Problems (POMDPs) are emerging as a popular approach for modeling multiagent teamwork where a group of agents work together to joi...
Ranjit Nair, Milind Tambe, Maayan Roth, Makoto Yok...
ICML
1999
IEEE
14 years 8 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan