Imitation learning, also called learning by watching or programming by demonstration, has emerged as a means of accelerating many reinforcement learning tasks. Previous work has shown the value of imitation in domains where a single mentor demonstrates execution of a known optimal policy for the benefit of a learning agent. We consider the more general scenario of learning from mentors who are themselves agents seeking to maximize their own rewards. We propose a new algorithm based on the concept of transferable utility for ensuring that an observer agent can learn efficiently in the context of a selfish, not necessarily helpful, mentor. We also address the questions of when an imitative agent should request help from a mentor, and when the mentor can be expected to acknowledge a request for help. In analogy with other types of active learning, we call the proposed approach active imitation learning.
Aaron P. Shon, Deepak Verma, Rajesh P. N. Rao