We study the problem of event detection from realistic videos with repetitive sequential human activities. Despite the large body of work on event detection and recognition, very few have addressed low-quality videos captured from realistic environments. Our framework is based on solving the shortest path on a temporal-event graph constructed from the video content. Graph vertices correspond to detected event primitives, and edge weights are set according to generic knowledge of the event patterns and the discrepancy between event primitives based on a greedy matching of their visual features. Experimental results on videos collected from a retail environment validate the usefulness of the proposed approach.