It is now accepted that the most eective video shot retrieval is based on indexing and retrieving clips using multiple, parallel modalities such as text-matching, image-matching a...
Text retrieval from broadcast news video is unsatisfactory, because a transcript word frequently does not directly ‘describe’ the shot when it was spoken. Extending the retriev...
Human vision system actively seeks interesting regions in images to reduce the search effort in tasks, such as object detection and recognition. Similarly, prominent actions in v...
Multi-modal person authentication systems can achieve higher performance and robustness by combining different modalities. The current fusion strategies of different modalities ar...