Batch reinforcement learning in a complex domain

16 years 22 days ago

Download userweb.cs.utexas.edu

Temporal diﬀerence reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent’s experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively ineﬃcient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning “update” for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve greater data eﬃciency by saving experience data and using it in aggregate to make updates to the learned policy. Their success has been demonstrated in the past on simple domains like grid worlds and low-dimensional control applications like pole balancing. In this paper, we compare and contrast batch reinforcement learning algorithms with on-line algorithms based on their empirical performance in a complex, continuous, noisy, multiagent domain, namely RoboCup soccer Keepaw...

Shivaram Kalyanakrishnan, Peter Stone

Real-time Traffic