We report on the use of reinforcement learning with Cobot, a software agent residing in the wellknown online community LambdaMOO. Our initial work on Cobot (Isbell et al.2000) provided him with the ability to collect social statistics and report them to users. Here we describe our application of RL to allow Cobot to proactively take actions in this complex social environment, and adapt his behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users, Cobot learned nontrivial preferences for a number of users, modifing his behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.
Charles Lee Isbell Jr., Christian R. Shelton, Mich