Sciweavers

91 search results - page 16 / 19
» Parameter-exploring policy gradients
Sort
View
DAC
2008
ACM
14 years 8 months ago
Temperature management in multiprocessor SoCs using online learning
In deep submicron circuits, thermal hot spots and high temperature gradients increase the cooling costs, and degrade reliability and performance. In this paper, we propose a low-co...
Ayse Kivilcim Coskun, Tajana Simunic Rosing, Kenny...
CDC
2010
IEEE
136views Control Systems» more  CDC 2010»
13 years 2 months ago
Pathologies of temporal difference methods in approximate dynamic programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Dimitri P. Bertsekas
PKDD
2009
Springer
181views Data Mining» more  PKDD 2009»
14 years 2 months ago
Active Learning for Reward Estimation in Inverse Reinforcement Learning
Abstract. Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, w...
Manuel Lopes, Francisco S. Melo, Luis Montesano
ICRA
2008
IEEE
129views Robotics» more  ICRA 2008»
14 years 2 months ago
Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion?
— We examine the usefulness of passive compliance in a manipulator that learns contact motion. Based on the notice that humans outperforms robots with the contact motion, we foll...
Seung-kook Yun
NIPS
2003
13 years 9 months ago
Extending Q-Learning to General Adaptive Multi-Agent Systems
Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This pap...
Gerald Tesauro