This paper presents a monocular vision framework enabling feature-oriented appearance-based navigation in large outdoor environments containing other moving objects. The framework is based on a hybrid topological-geometrical environment representation, constructed from a learning sequence acquired during a robot motion under human control. The framework achieves the desired navigation functionality without requiring a global geometrical consistency of the underlying environment representation. The main advantages with respect to conventional alternatives are unlimited scalability, real-time mapping and effortless dealing with interconnected environments once the loops have been properly detected. The framework has been validated in demanding, cluttered and interconnected environments, under different imaging conditions. The experiments have been performed on many long sequences acquired from moving cars, as well as in real-time large-scale navigation trials relying exclusively on a s...