Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Abstract. This work extends studies of Angluin, Lange and Zeugmann on the dependence of learning on the hypotheses space chosen for the class. In subsequent investigations, uniform...
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are we...