Sciweavers

185 search results - page 30 / 37
» Simulation-Based Optimization Algorithms for Finite-Horizon ...
Sort
View
AAAI
2010
13 years 9 months ago
Symbolic Dynamic Programming for First-order POMDPs
Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (appro...
Scott Sanner, Kristian Kersting
NIPS
2007
13 years 9 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
NIPS
2001
13 years 9 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
VTC
2008
IEEE
152views Communications» more  VTC 2008»
14 years 2 months ago
Network Controlled Joint Radio Resource Management for Heterogeneous Networks
Abstract— In this paper, we propose a way of achieving optimality in radio resource management (RRM) for heterogeneous networks. We consider a micro or femto cell with two co-loc...
Marceau Coupechoux, Jean Marc Kelif, Philippe Godl...
ATAL
2007
Springer
14 years 1 months ago
Combinatorial resource scheduling for multiagent MDPs
Optimal resource scheduling in multiagent systems is a computationally challenging task, particularly when the values of resources are not additive. We consider the combinatorial ...
Dmitri A. Dolgov, Michael R. James, Michael E. Sam...