In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While ...
Computed prediction represents a major shift in learning classifier system research. XCS with computed prediction, based on linear approximators, has been applied so far to functi...
Pier Luca Lanzi, Daniele Loiacono, Stewart W. Wils...
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. The payoff received by the controller can be evaluated in different ways, dep...