Text detection and tracking is an important step in a video content analysis system as it brings important semantic clues which is a vital supplemental source of index information. While there has been a significant amount of research done on video text detection and tracking, there are very few works on performance evaluation of such systems. Evaluations of this nature have not been attempted because of the extensive effort required to establish a reliable ground truth even for a moderate video dataset. However, such ventures are gaining importance now. In this paper, we propose a generic method for evaluation of object detection and tracking systems in video domains where ground truth objects can be bounded by simple geometric shapes (polygons, ellipses). Two comprehensive measures, one each for detection and tracking, are proposed and substantiated to capture different aspects of the task in a single score. We choose text detection and tracking tasks to show the effectiveness of our...