This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework fo...
We use graphical models and structure learning to explore how people learn policies in sequential decision making tasks. Studies of sequential decision-making in humans frequently...
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
Project-based work via telecommunications requires the instructor and the students to take explicit steps to create an on-line community that is focused on high quality learning a...
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...