We present a method for automatically extracting salient object from a single image, which is cast in an energy minimization framework. Unlike most previous methods that only leverage appearance cues, we employ an autocontext cue as a complementary data term. Benefitting from a generic saliency model for bootstrapping, the segmentation of the salient object and the learning of the auto-context model are iteratively performed without any user intervention. Upon convergence, we obtain not only a clear separation of the salient object, but also an auto-context classifier which can be used to recognize the same type of object in other images. Our experiments on four benchmarks demonstrated the efficacy of the added contextual cue. It is shown that our method compares favorably with the state-of-theart, some of which even embraced user interactions.