This paper presents a method to represent two-person interactions at a semantic level with a natural language description. A human interaction is composed of two singleperson actions, which in turn are made up of torso and arm/leg motions. We adopt the `verb argument structure' in linguistics to represent human action in terms of <agent-motion-target> triplets. Various two-person interactions are represented at a detailed level using multiple triplets aligned along a time line according to the spatial/temporal constraints of the interactions. Our method provides a user-friendly natural-language description of various human interactions, and properly describes positive, neutral, and negative interactions occurring between two persons.
Jake K. Aggarwal, Sangho Park