Sciweavers

1166 search results - page 47 / 234
» Negotiating Using Rewards
Sort
View
MIRRORBOT
2005
Springer
179views Robotics» more  MIRRORBOT 2005»
14 years 2 months ago
Learning to Interpret Pointing Gestures: Experiments with Four-Legged Autonomous Robots
This paper explores the hypothesis that pointing gesture recognition can be learned using a reward based system. An experiment with two four-legged robots is presented. One of the...
Verena Vanessa Hafner, Frédéric Kapl...
AUSAI
1999
Springer
14 years 1 months ago
Q-Learning in Continuous State and Action Spaces
Abstract. Q-learning can be used to learn a control policy that maximises a scalar reward through interaction with the environment. Qlearning is commonly applied to problems with d...
Chris Gaskett, David Wettergreen, Alexander Zelins...
NECO
2010
97views more  NECO 2010»
13 years 7 months ago
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...
WPES
2003
ACM
14 years 2 months ago
Policy migration for sensitive credentials in trust negotiation
Trust negotiation is an approach to establishing trust between strangers through the bilateral, iterative disclosure of digital credentials. Under automated trust negotiation, acc...
Ting Yu, Marianne Winslett
AAAI
2007
13 years 11 months ago
Authorial Idioms for Target Distributions in TTD-MDPs
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there i...
David L. Roberts, Sooraj Bhat, Kenneth St. Clair, ...