We give a new practical algorithm to compute, in finite time, a fixpoint (and often the least fixpoint) of a system of equations in the abstract numerical domains of zones and t...
Stephane Gaubert, Eric Goubault, Ankur Taly, Sarah...
In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearc...
We consider the problem of rate and power allocation in a multiple-access channel. Our objective is to obtain rate and power allocation policies that maximize a general concave ut...
Ali ParandehGheibi, Atilla Eryilmaz, Asuman E. Ozd...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce near-optimal value functions and p...