Abstract. We present a new reinforcement learning approach for deterministic continuous control problems in environments with unknown, arbitrary reward functions. The difficulty of...
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function....
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of expected reward. Their long-run convergence has been proved, even in partially ob...
Abstract. Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge have particularly interesting properties with respect to adaptive control, ...
Automatically building maps from sensor data is a necessary and fundamental skill for mobile robots; as a result, considerable research attention has focused on the technical chall...