This paper presents a new robot-vision system architecture for real-time moving object localization. The 6-DOF (3 translation and 3 rotation) motion of the objects is detected and tracked accurately in clutter using a model-based approach without information of the objects’ initial positions. An object identification task and an object tracking task are combined under this architecture. The computational time-lag between the two tasks is absorbed by a large amount of frame memory. The tasks are implemented as independent software modules using stereo-vision-based methods which can deal with objects of various shapes with edges, including planar to smooth-curved objects, in cluttered environments. This architecture also leads to failure-recoverable object tracking, because the tracking processes can be automatically recovered, even if the moving objects are lost while tracking. Experimental results obtained with prototype systems demonstrate the effectiveness of the proposed archit...