In this work, we consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relation between the action and the demand rate is not known in advance. The retailer will learn the optimal action policy “on the fly” as she maximizes her total expected revenue based on observed demand reactions. Using the pricing problem as an example, we propose a dynamic “learning-while-doing” algorithm to achieve a near optimal performance. Furthermore, we prove that the convergence rate of our algorithm is almost the fastest among all possible algorithms in terms of asymptotic “regret” (the relative loss comparing to the full information optimal solution). Our result closes the performance gaps between parametric and non-parametric lea...