Sciweavers

87 search results - page 11 / 18
» A policy iteration algorithm for Markov decision processes s...
Sort
View
FOCS
2007
IEEE
14 years 1 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala
AIPS
2007
13 years 9 months ago
Learning to Plan Using Harmonic Analysis of Diffusion Models
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...
ATAL
2009
Springer
14 years 2 months ago
SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in pa...
Michael R. James, Satinder P. Singh
ICML
1999
IEEE
14 years 8 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
GECCO
2005
Springer
152views Optimization» more  GECCO 2005»
14 years 1 months ago
GAMM: genetic algorithms with meta-models for vision
Recent adaptive image interpretation systems can reach optimal performance for a given domain via machine learning, without human intervention. The policies are learned over an ex...
Greg Lee, Vadim Bulitko