In deep submicron circuits, thermal hot spots and high temperature gradients increase the cooling costs, and degrade reliability and performance. In this paper, we propose a low-co...
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Abstract. Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, w...
— We examine the usefulness of passive compliance in a manipulator that learns contact motion. Based on the notice that humans outperforms robots with the contact motion, we foll...
Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This pap...