Top-down visual saliency facilities object localization by providing a discriminative representation of target objects and a probability map for reducing the search space. In this paper, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a discriminative dictionary. The proposed model is formulated based on a CRF with latent variables. By using sparse codes as latent variables, we train the dictionary modulated by CRF, and meanwhile a CRF with sparse coding. We propose a max-margin approach to train our model via fast inference algorithms. We evaluate our model on the Graz02 and PASCAL VOC 2007 datasets. Experimental results show that our model performs favorably against the stateof-the-art top-down saliency methods. We also observe that the dictionary update significantly improves the model performance.