Sciweavers

185 search results - page 34 / 37
» Simulation-Based Optimization Algorithms for Finite-Horizon ...
Sort
View
120
Voted
AAAI
2010
15 years 5 months ago
PUMA: Planning Under Uncertainty with Macro-Actions
Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan...
Ruijie He, Emma Brunskill, Nicholas Roy
149
Voted
JMLR
2006
190views more  JMLR 2006»
15 years 3 months ago
Causal Graph Based Decomposition of Factored MDPs
We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesia...
Anders Jonsson, Andrew G. Barto
ICML
2007
IEEE
16 years 4 months ago
Conditional random fields for multi-agent reinforcement learning
Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of obser...
Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanat...
133
Voted
QUESTA
2010
112views more  QUESTA 2010»
15 years 2 months ago
Admission control for a multi-server queue with abandonment
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. O...
Yasar Levent Koçaga, Amy R. Ward
129
Voted
ICML
2009
IEEE
16 years 4 months ago
Predictive representations for policy gradient in POMDPs
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
Abdeslam Boularias, Brahim Chaib-draa