This paper presents a method for the detection and recognition of social interactions in a day-long first-person video of a social event, like a trip to an amusement park. The location and orientation of faces are estimated and used to compute the line of sight for each face. The context provided by all the faces in a frame is used to convert the lines of sight into locations in space to which individuals attend. Further, individuals are assigned roles based on their patterns of attention. The roles and locations of individuals are analyzed over time to detect and recognize the types of social interactions. In addition to patterns of face locations and attention, the head movements of the first-person can provide additional useful cues as to their attentional focus. We demonstrate encouraging results on detection and recognition of social interactions in first-person videos captured from multiple days of experience in amusement parks.
Alireza Fathi, Jessica K. Hodgins, James M. Rehg