In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearc...
Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. It is generally assumed that using an artificially low discount f...
Abstract— We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an a...
The success ofreinforcement learninginpractical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experime...
One knowledge discovery problem in the rapid response setting is the cost of learning which patterns are indicative of a threat. This typically involves a detailed follow-through,...
Peter Frazier, Warren B. Powell, Savas Dayanik, Pa...