Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
We provide a provably efficient algorithm for learning Markov Decision Processes (MDPs) with continuous state and action spaces in the online setting. Specifically, we take a mo...
This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. W...
Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systems—for example, Hidden Markov Models (HMMs), Partially Observable Marko...
— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best ...