This paper proposes a novel approach to extract meaningful content information from video by collaborative integration of imageunderstanding and natural language processing. As an actual example, we developed a system that associates faces and names in videos, called Name-It, which is given news videos as a knowledge source, then automatically extracts face and name association as content information. The system can infer the name of a given unknown face image, or guess faces which are likely to have the name given to the system. This paper explains the method with several successful matching results which reveal e ectiveness in integrating heterogeneous techniques as well as the importance of real content information extraction from video, especially face-name association. This material is based upon work supported by the National Science Foundation under Cooperative Agreement No. IRI-9411299. Any opinions, ndings, and conclusions or recommendations expressed in this material are tho...