Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...
Caustics produce beautiful and intriguing illumination patterns. However, their complex behavior make them difficult to simulate accurately in all but the simplest configurations....
We describe a computer program to assist a clinician with assessing the e cacy of treatments in experimental studies for which treatment assignment is random but subject complianc...
— In many applications of failure time data analysis, it is important to perform inferences about the median of the distribution function in situations of failure time data model...