This paper describes a vision-based system for autonomous urban transport missions in outdoor environments. Specialized modules are implemented for particular tasks such as lane tracking and navigation along crossing points. A system that can execute a complex mission cannot simply be the sum of its perceptual modalities, and so, there needs to be a "plan" which uses high level knowledge about goals and intentions to direct the behaviors of the low level perception and actuation modules. The system presented in this work undertakes the challenge of getting a real robot to work in the real world, in real time.