Abstract. We consider a continuous-time model for inventory management with Markov modulated non-stationary demands. We introduce active learning by assuming that the state of the world is unobserved and must be inferred by the manager. We also assume that demands are observed only when they are completely met. We first derive the explicit filtering equations and pass to an equivalent fully observed impulse control problem in terms of the sufficient statistics, the a posteriori probability process and the current inventory level. We then solve this equivalent formulation and directly characterize an optimal inventory policy. We also describe a computational procedure to calculate the value function and the optimal policy and present two numerical illustrations.