Several researchers have shown that the efficiency of value iteration, a dynamic programming algorithm for Markov decision processes, can be improved by prioritizing the order of Bellman backups to focus computation on states where the value function can be improved the most. In previous work, a priority queue has been used to order backups. Although this incurs overhead for maintaining the priority queue, previous work has argued that the overhead is usually much less than the benefit from prioritization. However this conclusion is usually based on a comparison to a non-prioritized approach that performs Bellman backups on states in an arbitrary order. In this paper, we show that the overhead for maintaining the priority queue can be greater than the benefit, when it is compared to very simple heuristics for prioritizing backups that do not require a priority queue. Although the order of backups induced by our simple approach is often sub-optimal, we show that its smaller overhead...
Peng Dai, Eric A. Hansen