The ability to locate objects in a real-time video and relate them to virtual objects in a database is important in a number of applications including visually-guided robotic navigation, surveillance, military training and operation. The key problem of locating objects in a video scene with 3D coordinates is to find the accurate correspondence of objects and features in the two imageries. This paper focuses on developing novel algorithms that use robust structural features to register a virtual scene with the real scene captured by a video camera. We address a number of challenging issues including finding invariant image features, structural feature extraction and matching, and extracting a reliable set of points from matched feature structures. Experiments and analysis of results are presented.