In this paper, we describe a purely image-based system that allows a user to interactively capture the 3D geometry of a polyhedral scene with the aid of its physical presence. A video camera is used as both an interaction and tracking device. The 3D user interface is intuitive to a non-expert and the mouseless control procedure makes the system particularly suitable for mobile devices such as PDAs and mobile phones. The efficiency and accuracy of the method are demonstrated on a polyhedral scene made of two house-like boxes.