This paper presents a self-organizing cognitive architecture, known as TD-FALCON, that learns to function through its interaction with the environment. TD-FALCON learns the value ...
In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function....
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting th...
Abstract. Infinite-horizon multi-agent control processes with nondeterminism and partial state knowledge have particularly interesting properties with respect to adaptive control, ...
We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our cr...