This paper proposes a state based approach to gesture learning and recognition. Using spatial clustering and temporal alignment, each gesture is defined to be an ordered sequence of states in spatial-temporal space. The 2D image positions of the centers of the head and both hands of the user are used as features; these are located by a color based tracking method. From training data of a given gesture, we first learn the spatial information without doing data segmentation and alignment, and then group the data into segments that are automatically associated with information for temporal alignment. The temporal information is further integrated to build a Finite State Machine (FSM) recognizer. Each gesture has a FSM corresponding to it. The computational efficiency of the FSM recognizers allows us to achieve real-time online performance. We apply the proposed technique to build an experimental system that plays a game of “Simon Says” with the user.
Pengyu Hong, Thomas S. Huang, Matthew Turk