—The approach presented in this paper tackles the active research problem of the fast automatic modeling of large-scale environments from videos with millions of frames and collection of tens of thousands of photographs downloaded from the Internet. The approach leverages recent research in robust estimation, image based recognition and stereo depth estimation. The high computational speed is achieved through parallelization and execution on commodity graphics hardware. The approach achieves real-time reconstruction from video and reconstructs within less than a day from tens of thousands of downloaded images on a single commodity computer. We demonstrate modeling results on a variety of real-world video sequences and photo collections.