In some environments, a learning agent must learn to balance competing objectives. For example, a Q-learner agent may need to learn which choices expose the agent to risk and which choices lead to a goal. In this paper, we present a variant of Q-learning that learns a pair of utilities for worlds with dichotomous attributes and show that this algorithm properly balances the competing objectives and, as a result, efficiently identifies satisficing solutions. This occurs because exploration of the environment is restricted to those options which, according to current knowledge, are likely to avoid unjustifiable exposure to risk. We empirically validate the algorithm by (a) showing that the algorithm quickly converges to good policies in several simulated worlds of various complexities and (b) applying the algorithm to learning a force feedback profile for a gas pedal that helps drivers avoid risky situations.
Michael A. Goodrich, Morgan Quigley