Sciweavers

802 search results - page 154 / 161
» Experts in a Markov Decision Process
Sort
View
ICML
2001
IEEE
14 years 8 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
ICML
1998
IEEE
14 years 8 months ago
Intra-Option Learning about Temporally Abstract Actions
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh
WWW
2005
ACM
14 years 8 months ago
Executing incoherency bounded continuous queries at web data aggregators
Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some ...
Rajeev Gupta, Ashish Puri, Krithi Ramamritham
INFOCOM
2009
IEEE
14 years 2 months ago
Delay-Optimal Opportunistic Scheduling and Approximations: The Log Rule
—This paper considers the design of opportunistic packet schedulers for users sharing a time-varying wireless channel from the performance and the robustness points of view. Firs...
Bilal Sadiq, Seung Jun Baek, Gustavo de Veciana
CPAIOR
2009
Springer
14 years 2 months ago
Optimal Interdiction of Unreactive Markovian Evaders
The interdiction problem arises in a variety of areas including military logistics, infectious disease control, and counter-terrorism. In the typical formulation of network interdi...
Alexander Gutfraind, Aric A. Hagberg, Feng Pan