In ergodic MDPs we consider stationary distributions of policies that coincide in all but n states, in which one of two possible actions is chosen. We give conditions and formulas...
We study logit dynamics [3] for strategic games. At every stage of the game a player is selected uniformly at random and she is assumed to play according to a noisy best-response ...
Vincenzo Auletta, Diodato Ferraioli, Francesco Pas...
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...
We consider a new simulation-based optimization method called the Nested Partitions (NP) method. This method generates a Markov chain and solving the optimization problem is equiv...
One of the main difficulties faced when analyzing Markov chains modelling evolutionary algorithms is that their cardinality grows quite fast. A reasonable way to deal with this iss...