For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examples of such problems i...
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
We present an improvement of Noviko 's perceptron convergence theorem. Reinterpreting this mistakebound as a margindependent sparsity guarantee allows us to give a PAC{style ...
Thore Graepel, Ralf Herbrich, Robert C. Williamson
We study the problem of combining the outcomes of several different classifiers in a way that provides a coherent inference that satisfies some constraints. In particular, we deve...
We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes s...