— While the Partially Observable Markov Decision Process (POMDP) provides a formal framework for the problem of robot control under uncertainty, it typically assumes a known and stationary model of the environment. In this paper, we study the problem of finding an optimal policy for controlling a robot in a partially observable domain, where the model is not perfectly known, and may change over time. We present an algorithm called MEDUSA which incrementally learns a POMDP model using queries, while still optimizing a reward function. We demonstrate effectiveness of the approach for a simple scenario, where a robot seeking a person has minimal a priori knowledge of its own sensor model, as well as where the person is located.