Earlier this year, a major effort was initiated to study the theoretical and empirical aspects of the automatic detection of semantic concepts in broadcast video, complementing ongoing research in video analysis, the TRECVID video analysis evaluations by the National Institute of Standards (NIST) in the U.S., and MPEG-7 standardization. The video analysis community has long struggled to bridge the gap from successful, low-level feature analysis (color histograms, texture, shape) to semantic content description of video. One approach is to utilize a set of intermediate textual descriptors that can be reliably applied to visual scenes (e.g. outdoors, faces, animals). If we can define a rich enough set of such intermediate descriptors in the form of large lexicons and taxonomic classification schemes, then robust and general-purpose semantic content annotation and retrieval will be enabled through these descriptors. Our efforts are substantially broad, as our subject matter is broadcast v...
Alexander G. Hauptmann