A method for the recovery of the temporal structure and phasesin natural gesture is presented. The work is motivated by recent developments in the theory of natural gesture which have identified several key aspects of gesture important to communication. In particular, gesticulation during conversationcan be coarselycharacterized as periods of bi-phasic or tri-phasic gesture separated by a rest state. We first present an automatic procedure for hypothesizing plausible rest state configurations of a speaker; the method uses the repetition of subsequences to indicate potential rest states. Second, we develop a state-based parsing algorithm used to both select among candidate rest states and to parse an incoming video stream into bi-phasic and multiphasic gestures. We present results from examples of story-telling speakers.
Andrew D. Wilson, Aaron F. Bobick, Justine Cassell