Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and part...
Matthijs T. J. Spaan, Geoffrey J. Gordon, Nikos A....
In this paper, we investigate a new approach to process queries in data stream applications. We show that reference locality characteristics of data streams could be exploited in ...
Feifei Li, Ching Chang, George Kollios, Azer Besta...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
Protocols for distributed systems make often use of random transitions to achieve a common goal. A popular example are randomized leader election protocols. We introduce probabilis...
Virtually all methods of learning dynamic systems from data start from the same basic assumption: the learning algorithm will be given a sequence of data generated from the dynami...