We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
The progressive processing model allows a system to trade off resource consumption against the quality of the outcome by mapping each activity to a graph of potential solution met...
In modern automatic speech recognition systems, it is standard practice to cluster several logical hidden Markov model states into one physical, clustered state. Typically, the cl...
We consider opportunistic spectrum access for secondary users over multiple channels whose occupancy by primary users is modeled as discrete-time Markov processes. Due to hardware...