The asymptotic equipartition property in reinforcement learning and its relation to return maximization