We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Background: Although short interfering RNA (siRNA) has been widely used for studying gene functions in mammalian cells, its gene silencing efficacy varies markedly and there are o...
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on ...
This paper introduces a new algorithm, Q2, foroptimizingthe expected output ofamultiinput noisy continuous function. Q2 is designed to need only a few experiments, it avoids stron...
Andrew W. Moore, Jeff G. Schneider, Justin A. Boya...
One of the main problems in probabilistic grammatical inference consists in inferring a stochastic language, i.e. a probability distribution, in some class of probabilistic models...