Recommender systems — systems that suggest to users in e-commerce sites items that might interest them — adopt a static view of the recommendation process and treat it as a prediction problem. In an earlier paper, we argued that it is more appropriate to view the problem of generating recommendations as a sequential decision problem and, consequently, that Markov decision processes (MDPs) provide a more appropriate model for recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation, and they take into account the expected value of each recommendation. The use of MDPs in a commercial site raises three fundamental problems: providing an adequate initial model, updating this model online as new items (e.g., books) arrive, and coping with the enormous state-space of this model. In past work, we dealt with the first problem. In this paper we consider the second, and especially, the third problem, which is of greater concern ...
Ronen I. Brafman, David Heckerman, Guy Shani