We propose a new graphical model, called Sequential Interval Network (SIN), for parsing complex structured activity whose composition can be represented as a string-length limited stochastic context-free grammar. By exploiting the grammar, the generated network captures the activity’s global temporal structure while avoiding time-sliced manner model. In this network, the hidden variables are the timings of the component actions (i.e. when each action starts and ends), thus allows reasoning about duration and observation on interval/segmental level. Exact inference can be achieved and yield the posterior probabilities of the timings and of the frame’s label. We demonstrate this framework on vision tasks such as recognition and temporally segmentation of action sequence, or parsing and making future prediction online when running in streaming mode.
Nam N. Vo, Aaron F. Bobick