Sciweavers

209 search results - page 34 / 42
» Optimization and Convergence of Observation Channels in Stoc...
Sort
View
ICASSP
2011
IEEE
13 years 5 days ago
Social norm and long-run learning in peer-to-peer networks
We start by formulating the resource sharing in peer-to-peer (P2P) networks as a random-matching gift-giving game, where self-interested peers aim at maximizing their own long-ter...
Yu Zhang, Mihaela van der Schaar
NIPS
2001
13 years 9 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
CAD
2004
Springer
13 years 8 months ago
Control point adjustment for B-spline curve approximation
Pottmann et al. propose an iterative optimization scheme for approximating a target curve with a B-spline curve based on square distance minimization, or SDM. The main advantage o...
Huaiping Yang, Wenping Wang, Jia-Guang Sun
ML
1998
ACM
101views Machine Learning» more  ML 1998»
13 years 8 months ago
Elevator Group Control Using Multiple Reinforcement Learning Agents
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithmshave appeared that approximatedynamic programming on an ...
Robert H. Crites, Andrew G. Barto
ICC
2009
IEEE
127views Communications» more  ICC 2009»
14 years 3 months ago
Security Games with Incomplete Information
—We study two-player security games which can be viewed as sequences of nonzero-sum matrix games where at each stage of the iterations the players make imperfect observations of ...
Kien C. Nguyen, Tansu Alpcan, Tamer Basar