We present a novel framework for recognizing repetitive
sequential events performed by human actors with strong
temporal dependencies and potential parallel overlap. Our
solution incorporates sub-event (or primitive) detectors and
a spatiotemporal model for sequential event changes. We
develop an effective and efficient method to integrate primitives
into a set of sequential events where strong temporal
constraints are imposed on the ordering of the primitives.
In particular, the combination process is approached as an
optimization problem. A specialized Viterbi algorithm is designed
to learn and infer the target sequential events and
handle the event overlap simultaneously. To demonstrate
the effectiveness of the proposed framework, we report detailed
quantitative analysis on a large set of cashier checkout
activities in a retail store.