A key component of any reinforcement learning algorithm is the underlying representation used by the agent. While reinforcement learning (RL) agents have typically relied on hand-coded state representations, there has been a growing interest in learning this representation. While inputs to an agent are typically fixed (i.e., state variables represent sensors on a robot), it is desirable to automatically determine the optimal relative scaling of such inputs, as well as to diminish the impact of irrelevant features. This work introduces HOLLER, a novel distance metric learning algorithm, and combines it with an existing instance-based RL algorithm to achieve precisely these goals. The algorithms’ success is highlighted via empirical measurements on a set of six tasks within the mountain car domain. Categories and Subject Descriptors I.2.6 [Learning]: Miscellaneous General Terms Algorithms, Performance Keywords Reinforcement Learning, Distance Metric Learning, Autonomous Feature Selec...
Matthew E. Taylor, Brian Kulis, Fei Sha