We propose a framework for policy generation in continuoustime stochastic domains with concurrent actions and events of uncertain duration. We make no assumptions regarding the co...
This paper describes nonparametric Bayesian treatments for analyzing records containing occurrences of items. The introduced model retains the strength of previous approaches that...
A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...
One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model exclud...
Cheng Niu, Wei Li 0003, Jihong Ding, Rohini K. Sri...
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...