Recognition of human gestures is important for analysis and indexing of video. To recognize human gestures on video, generally a large number of training examples for each individual gesture must be collected. This is a labor-intensive and error-prone process and is only feasible for a limited set of gestures. In this paper, we present an approach for automatically segmenting sequences of natural activities into atomic sections and clustering them. Our work is inspired by natural language processing where words are extracted from long sentences. We extract primitive gestures from sequences of human motion. Our approach contains two steps. First, the sequences of human motion are segmented into atomic components and clustered using a Hidden Markov Model. Thus we can represent the original sequences by discrete symbols. Then we extract lexicon from these discrete sequences by using an algorithm named COMPRESSIVE. Experimental results on music conducting gestures demonstrate the effectiv...