Recent research indicates that modern computer workloads (e.g. processing time of web requests) follow heavy-tailed distributions. In a heavy-tailed distribution there are a large...
Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a di...
Satinder P. Singh, Diane J. Litman, Michael J. Kea...
We consider the round robin (RR) scheduling policy where the server processes each job in its buffer for at most a fixed quantum, q, in a round-robin fashion. The processor sharin...
— The problem of sensor activation in a controlled discrete event system is considered. Sensors are assumed to be costly and can be turned on/off during the operation of the syst...
Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the e...