Reinforcement learning (RL) problems constitute an important class of learning and control problems faced by artificial intelligence systems. In these problems, one is faced with the task of providing control signals that maximize some measure of performance, usually taken over time, given feedback that is not in terms of the control signals themselves. This feedback is often called "reward" or "punishment." However, these tasks have a direct relationship to engineering control, as well as the more cognitive intelligence related areas suggested by these terms (Barto, 1990). In recent years, many algorithms for RL have been suggested and refined. Notable are those discussed by Sutton, Barto, and Watkins (1989), Holland's Bucket Brigade (Holland et al., 1986), Watkin's Q-learning algorithm (Watkins and Dayan, 1992), and others. Despite these advances, there remains no standard, analytical methods or test suites for empirically evaluating reinforcement learn...
Robert E. Smith