We present an efficient approach that merges the virtual objects into video sequences taken by a freely moving camera in a realistic manner. The composition is visually and geometrically consistent through three main steps. First, a robust camera tracking algorithm based on key frames is proposed, which precisely recovers the focal length with a novel multi-frame strategy. Next, the concerned 3D models of the real scenes are reconstructed by means of an extended multi-baseline algorithm. Finally, the virtual objects in the form of 3D models are integrated into the real scenes, with special cares on the interaction consistency including shadow casting, occlusions and object animation. A variety of experiments have been implemented, which demonstrate the robustness and efficiency of our approach. 1