Sciweavers

87 search results - page 10 / 18
» Hybrid Least-Squares Algorithms for Approximate Policy Evalu...
Sort
View
ICML
2008
IEEE
14 years 8 months ago
A worst-case comparison between temporal difference and residual gradient with linear function approximation
Residual gradient (RG) was proposed as an alternative to TD(0) for policy evaluation when function approximation is used, but there exists little formal analysis comparing them ex...
Lihong Li
ATAL
2009
Springer
14 years 2 months ago
SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in pa...
Michael R. James, Satinder P. Singh
SIGMETRICS
2005
ACM
118views Hardware» more  SIGMETRICS 2005»
14 years 1 months ago
Nearly insensitive bounds on SMART scheduling
We define the class of SMART scheduling policies. These are policies that bias towards jobs with small remaining service times, jobs with small original sizes, or both, with the ...
Adam Wierman, Mor Harchol-Balter, Takayuki Osogami
ICRA
2010
IEEE
163views Robotics» more  ICRA 2010»
13 years 6 months ago
Exploiting domain knowledge in planning for uncertain robot systems modeled as POMDPs
Abstract— We propose a planning algorithm that allows usersupplied domain knowledge to be exploited in the synthesis of information feedback policies for systems modeled as parti...
Salvatore Candido, James C. Davidson, Seth Hutchin...
RSS
2007
176views Robotics» more  RSS 2007»
13 years 9 months ago
Active Policy Learning for Robot Planning and Exploration under Uncertainty
Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...