In this paper we introduce Ant-Q, a family of algorithms which present many similarities with Q-learning (Watkins, 1989), and which we apply to the solution of symmetric and asymm...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decisio...
Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing state...
Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribu...
Katherine E. Coons, Behnam Robatmili, Matthew E. T...
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carryin...