Sciweavers

38 search results - page 4 / 8
» On the Convergence of Optimistic Policy Iteration
Sort
View
ICML
2010
IEEE
13 years 8 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
CORR
2008
Springer
115views Education» more  CORR 2008»
13 years 7 months ago
Adaptive Sum Power Iterative Waterfilling for MIMO Cognitive Radio Channels
Abstract--In this paper, the sum capacity of the Gaussian Multiple Input Multiple Output (MIMO) Cognitive Radio Channel (MCC) is expressed as a convex problem with finite number of...
Rajiv Soundararajan, Sriram Vishwanath
MICRO
2006
IEEE
73views Hardware» more  MICRO 2006»
14 years 1 months ago
Merging Head and Tail Duplication for Convergent Hyperblock Formation
VLIW and EDGE (Explicit Data Graph Execution) architectures rely on compilers to form high-quality hyperblocks for good performance. These compilers typically perform hyperblock f...
Bertrand A. Maher, Aaron Smith, Doug Burger, Kathr...
GLOBECOM
2009
IEEE
13 years 11 months ago
Stochastic Resource Allocation over Fading Multiple Access and Broadcast Channels
In this paper, we consider the optimal rate and power allocation that maximizes a general utility function of average user rates in a fading multiple-access or broadcast channel. B...
Na Gao, Xin Wang
AI
2002
Springer
13 years 7 months ago
Multiagent learning using a variable learning rate
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on ...
Michael H. Bowling, Manuela M. Veloso