Sciweavers

ALT
2007
Springer
14 years 8 months ago
Tuning Bandit Algorithms in Stochastic Environments
Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this p...
Jean-Yves Audibert, Rémi Munos, Csaba Szepe...
ALT
2007
Springer
14 years 8 months ago
Prescribed Learning of R.E. Classes
Abstract. This work extends studies of Angluin, Lange and Zeugmann on the dependence of learning on the hypotheses space chosen for the class. In subsequent investigations, uniform...
Sanjay Jain, Frank Stephan, Nan Ye
ALT
2007
Springer
14 years 8 months ago
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are we...
Ronald Ortner