We present a completely automatic method for obtaining the approximate calibration of a camera (alignment to a world frame and focal length) from a single image of an unknown scene, provided only that the scene satisfies a Manhattan world assumption. This assumption states that the imaged scene contains three orthogonal, dominant directions, and is often satisfied by outdoor or indoor views of man-made structures and environments. The proposed method combines the calibration likelihood introduced in [5] with a stochastic search algorithm to obtain a MAP estimate of the camera's focal length and alignment. Results on real images of indoor scenes are presented. The calibrations obtained are less accurate than those from standard methods employing a calibration pattern or multiple images. However, the outputs are certainly good enough for common vision tasks such as tracking. Moreover, the results are obtained without any user intervention, from a single image, and without use of a c...
J. Deutscher, Michael Isard, John MacCormick