We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a mo...
We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O ( T) regret. The setting is a natural general...
Introductory computer programming courses are inherently challenging for a variety of reasons. With increased demands for online delivery, the use of effective technologies, materi...
In apprenticeship learning, the goal is to learn a policy in a Markov decision process that is at least as good as a policy demonstrated by an expert. The difficulty arises in tha...