Despite the numerous optimization and evaluation studies that have been conducted with TLBs over the years, there is still a deficiency in an indepth understanding of TLB characte...
We offer a new formal criterion for agent-centric learning in multi-agent systems, that is, learning that maximizes one’s rewards in the presence of other agents who might also...
Abstract. In this paper we study distributed online learning of locomotion gaits for modular robots. The learning is based on a stochastic approximation method, SPSA, which optimiz...
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
In smart grid, a home appliance can adjust its power consumption level according to the realtime power price obtained from communication channels. Most studies on smart grid do not...