Sciweavers

185 search results - page 34 / 37
» Simulation-Based Optimization Algorithms for Finite-Horizon ...
Sort
View
AAAI
2010
13 years 9 months ago
PUMA: Planning Under Uncertainty with Macro-Actions
Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan...
Ruijie He, Emma Brunskill, Nicholas Roy
JMLR
2006
190views more  JMLR 2006»
13 years 7 months ago
Causal Graph Based Decomposition of Factored MDPs
We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesia...
Anders Jonsson, Andrew G. Barto
ICML
2007
IEEE
14 years 8 months ago
Conditional random fields for multi-agent reinforcement learning
Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of obser...
Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanat...
QUESTA
2010
112views more  QUESTA 2010»
13 years 6 months ago
Admission control for a multi-server queue with abandonment
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. O...
Yasar Levent Koçaga, Amy R. Ward
ICML
2009
IEEE
14 years 8 months ago
Predictive representations for policy gradient in POMDPs
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
Abdeslam Boularias, Brahim Chaib-draa