Sciweavers

262 search results - page 38 / 53
» Bounded-Parameter Partially Observable Markov Decision Proce...
Sort
View
ML
2002
ACM
121views Machine Learning» more  ML 2002»
13 years 10 months ago
Near-Optimal Reinforcement Learning in Polynomial Time
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decisio...
Michael J. Kearns, Satinder P. Singh
AIED
2011
Springer
13 years 2 months ago
Faster Teaching by POMDP Planning
Both human and automated tutors must infer what a student knows and plan future actions to maximize learning. Though substantial research has been done on tracking and modeling stu...
Anna N. Rafferty, Emma Brunskill, Thomas L. Griffi...
AAAI
2006
14 years 13 days ago
Compact, Convex Upper Bound Iteration for Approximate POMDP Planning
Partially observable Markov decision processes (POMDPs) are an intuitive and general way to model sequential decision making problems under uncertainty. Unfortunately, even approx...
Tao Wang, Pascal Poupart, Michael H. Bowling, Dale...
ECML
2007
Springer
14 years 5 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber
AI
2006
Springer
14 years 2 months ago
Belief Selection in Point-Based Planning Algorithms for POMDPs
Abstract. Current point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value funct...
Masoumeh T. Izadi, Doina Precup, Danielle Azar