Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
A drama manager (DM) monitors an interactive experience, such as a computer game, and intervenes to shape the global experience so it satisfies the author's expressive goals ...
When facing dynamic optimization problems the goal is no longer to find the extrema, but to track their progression through the space as closely as possible. Over these kind of ov...
Carlos Fernandes, Agostinho C. Rosa, Vitorino Ramo...
Finite state transducers (FSTs) are finite state machines that map strings in a source domain into strings in a target domain. While there are many reports in the literature of ev...