Although camera self-calibration and metric reconstruction have been extensively studied during the past decades, automatic metric reconstruction from long video sequences with varying focal length is still very challenging. Several critical issues in practical implementations are not adequately addressed. For example, how to select the initial frames for initializing the projective reconstruction? What criteria should be used? How to handle the large zooming problem? How to choose an appropriate moment for upgrading the projective reconstruction to a metric one? This paper gives a careful investigation of all these issues. Practical and effective approaches are proposed. In particular, we show that existing image-based distance is not an adequate measurement for selecting the initial frames. We propose a novel measurement to take into account the zoom degree, the self-calibration quality, as well as image-based distance. We then introduce a new strategy to decide when to upgrade the ...