Natural policy gradient methods and the covariance matrix adaptation evolution strategy, two variable metric methods proposed for solving reinforcement learning tasks, are contrasted to point out their conceptual similarities and differences. Experiments on the cart pole benchmark are conducted as a first attempt to compare their performance.