When controlling dynamic systems, such as mobile robots in uncertain environments, there is a trade off between risk and reward. For example, a race car can turn a corner faster b...
We introduce the ALeRT (Action-dependent Learning Rates with Trends) algorithm that makes two modifications to the learning rate and one change to the exploration rate of traditio...
Maria Cutumisu, Duane Szafron, Michael H. Bowling,...
During the past few years, point-based POMDP solvers have gradually scaled up to handle medium sized domains through better selection of the set of points and efficient backup met...
Guy Shani, Pascal Poupart, Ronen I. Brafman, Solom...
Staffing and scheduling optimization in large multiskill call centers is time-consuming, mainly because it requires lengthy simulations to evaluate performance measures and their ...
The quality of the input system model has a direct bearing on the effectiveness of the system exploration and synthesis tools. Given a well-structured system model, tools today are...