We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning....
Carlos Guestrin, Daphne Koller, Chris Gearhart, Ne...
Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of obser...
Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanat...
— A novel cross-layer call admission control policy is proposed for a general CDMA beamforming system. In contrast to previously proposed call admission control (CAC) policies wh...