We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret w...
This paper describes an application of recently developed qualitative reasoning techniques to complex, socio{economic allocation problems. We explain why we believe traditional op...
Howard's policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to we...
Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the...
We consider a manufacturing plant that purchases raw materials for product assembly and then sells the final products to customers. There are M types of raw materials and K types o...