In this paper we present a general framework for object detection and segmentation. Using a bottom-up unsupervised merging algorithm, a region-based hierarchy that represents the image at different resolution levels is created. Next, top-down, object class knowledge is used to select and combine regions from the hierarchy, in order to define the exact object shape. We illustrate the usefulness of the approach with four different object classes: sky, caption text, traffic signs and faces.