We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
This paper examines a number of solution methods for decision processes with non-Markovian rewards (NMRDPs). They all exploit a temporal logic specification of the reward functio...
Both human and automated tutors must infer what a student knows and plan future actions to maximize learning. Though substantial research has been done on tracking and modeling stu...
Anna N. Rafferty, Emma Brunskill, Thomas L. Griffi...
Unlike mono-agent systems, multi-agent planing addresses the problem of resolving conflicts between individual and group interests. In this paper, we are using a Decentralized Ve...
For a given problem, the optimal Markov policy over a finite horizon is a conditional plan containing a potentially large number of branches. However, there are applications wher...