This paper addresses the problem of scheduling jobs in soft real-time systems, where the utility of completing each job decreases over time. We present a utility-based framework for making repeated scheduling decisions based on dynamically observed information about unscheduled jobs and system’s resources. This framework generalizes the standard scheduling problem to a resource-constrained environment, where resource allocation (RA) decisions (how many CPUs to allocate to each job) have to be made concurrently with the scheduling decisions (when to execute each job). Discrete-time Optimal Control theory is used to formulate the optimization problem of finding the scheduling/RA policy that maximizes the average utility per time step obtained from completed jobs. We propose a Reinforcement Learning (RL) architecture for solving the NP-hard Optimal Control problem in real time, and our experimental results demonstrate the feasibility and benefits of the proposed approach. Content Area...