A novel method for summarizing videos of gestures is presented. The gestures performed by the hands and the head are extracted through skin color segmentation and represented through Zernike moments. The gesture energy is calculated using the norms of the Zernike moments and monitored through time for local minima and maxima that indicate distinctive visual events and thus key-frames. The proposed scheme is not thresholddependent and therefore the number of extracted key-frames varies according to the complexity of gesture energy variation. The applicability of the method is verified experimentally in sign language videos.
Dimitrios I. Kosmopoulos, Anastasios D. Doulamis,