Probabilistic planning problems are typically modeled as a Markov Decision Process (MDP). MDPs, while an otherwise expressive model, allow only for sequential, non-durative action...
One-clock priced timed games is a class of two-player, zero-sum, continuous-time games that was defined and thoroughly studied in previous works. We show that One-clock priced ti...
Thomas Dueholm Hansen, Rasmus Ibsen-Jensen, Peter ...
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
Although many real-world stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current state-of-the-art methods ...
Carlos Guestrin, Milos Hauskrecht, Branislav Kveto...
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...