We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
—This paper considers the design of opportunistic packet schedulers for users sharing a time-varying wireless channel from the performance and the robustness points of view. Firs...