Efficient exploration through active learning for value function approximation in reinforcement learning