Many vision problems can be cast as optimizing the conditional probability density function p(C|I) where I is an image and C is a vector of model parameters describing the image. Ideally, the density function p(C|I) would be smooth and unimodal allowing local optimization techniques, such as gradient descent or simplex, to converge to an optimal solution quickly, while preserving significant nonlinearities of the model. We propose to learn a conditional probability density satisfying these desired properties for the given training data set. To do this, we formulate a novel regression problem that finds a function approximating the target density. Learning the regressor is challenging due to the high dimensionality of model parameters, C, and the complexity of relating the image and the model. Our approach makes two contributions. First, we take a multilevel refinement approach by learning a series of density functions, each of which guides the solution of optimization algorithms incre...