We present an approach for active segmentation based on integration of several cues. It serves as a framework for generation of object hypotheses of previously unseen objects in natural scenes. Using an approximate Expectation-Maximisation method, the appearance, 3D shape and size of objects are modelled in an iterative manner, with fixation used for unsupervised initialisation. To better cope with situations where an object is hard to segregate from the surface it is placed on, a flat surface model is added to the typical two hypotheses used in classical figure-ground segmentation. The framework is further extended to include modelling over time, in order to exploit temporal consistency for better segmentation and to facilitate tracking.