This site uses cookies to deliver our services and to ensure you get the best experience. By continuing to use this site, you consent to our use of cookies and acknowledge that you have read and understand our Privacy Policy, Cookie Policy, and Terms
In ergodic MDPs we consider stationary distributions of policies that coincide in all but n states, in which one of two possible actions is chosen. We give conditions and formulas...
We study logit dynamics [3] for strategic games. At every stage of the game a player is selected uniformly at random and she is assumed to play according to a noisy best-response ...
Vincenzo Auletta, Diodato Ferraioli, Francesco Pas...
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
We consider a new simulation-based optimization method called the Nested Partitions (NP) method. This method generates a Markov chain and solving the optimization problem is equiv...
One of the main difficulties faced when analyzing Markov chains modelling evolutionary algorithms is that their cardinality grows quite fast. A reasonable way to deal with this iss...