Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
This paper studies adaptive bilateral negotiation between software agents in e-commerce environments. Specifically, we assume that the agents are self-interested, the environment...
We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknow...
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepa...
We consider the task of reinforcement learning in an environment in which rare significant events occur independently of the actions selected by the controlling agent. If these ev...
In a M/M/N+M queue, when there are many customers waiting, it may be preferable to reject a new arrival rather than risk that arrival later abandoning without receiving service. O...