TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
: We present a distributed learning algorithm for optimizing transit prices in the inter-domain routing framework. We present a combined game theoretical and distributed algorithmi...
Partially observed actions are observations of action executions in which we are uncertain about the identity of objects, agents, or locations involved in the actions (e.g., we kn...
Abstract— The paper considers the algorithm NLU for distributed (vector) parameter estimation in sensor networks, where, the local observation models are nonlinear, and inter-sen...
It has been shown that the problem of 1-penalized least-square regression commonly referred to as the Lasso or Basis Pursuit DeNoising leads to solutions that are sparse and there...