We present a method for efficiently generating a representation of a multi-modal posterior probability distribution. The technique combines ideas from RANSAC and particle filtering such that the 3D visual tracking problem can be partitioned into two levels, while maintaining multiple hypotheses throughout. A simple texture change-point detector finds multiple hypotheses for the position of image edgels. From these, multiple locations for each scene edge are generated. Finally, we determine the best pose of the whole structure. While the multi-modal representation is strongly related to particle filtering techniques, this approach is driven by data from the image. Hence the resulting system is able to perform robust visual tracking of all six degrees of freedom in real time. Real video sequences are used to compare the complete tracking system to previous systems.