In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Given an adequate simulation model of the task environment and payoff function that measures the quality of partially successful plans, competition-based heuristics such as geneti...
Our goal is to provide learning mechanisms to game agents so they are capable of adapting to new behaviors based on the actions of other agents. We introduce a new on-line reinfor...
Deep-layer machine learning architectures continue to emerge as a promising biologically-inspired framework for achieving scalable perception in artificial agents. State inference ...
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discoun...