We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to v...
J. Andrew Bagnell, Sham Kakade, Andrew Y. Ng, Jeff...
We consider the problem of browsing the top ranked portion of the documents returned by an information retrieval system. We describe an interactive relevance feedback agent that a...
In this paper a learning based local search approach for propositional satisfiability is presented. It is based on an original adaptation of the conflict driven clause learning ...
Gilles Audemard, Jean-Marie Lagniez, Bertrand Mazu...
Ad hoc networks represent a key factor in the evolution of wireless communications. These networks typically consist of equal nodes that communicate without central control, inter...
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...