Feature selection and policy optimization for distributed instruction placement using reinforcement learning