So-called Physical Unclonable Functions are an emerging, new cryptographic and security primitive. They can potentially replace secret binary keys in vulnerable hardware systems and have other security advantages. In this paper, we deal with the cryptanalysis of this new primitive by use of machine learning methods. In particular, we investigate to what extent the security of circuit-based PUFs can be challenged by a new machine learning technique named Policy Gradients with Parameter-based Exploration. Our findings show that this technique has several important advantages in cryptanalysis of Physical Unclonable Functions compared to other machine learning fields and to other policy gradient methods.