Cashiers in retail stores usually exhibit certain repetitive and periodic activities when processing items. Detecting such activities plays a key role in most retail fraud detection systems. In this paper, we propose a highly efficient, effective and robust vision technique to detect checkout-related primitive activities, based on a hierarchical finite state machine (FSM). Our deterministic approach uses visual features and prior spatial constraints on the hand motion to capture particular motion patterns performed in primitive activities. We also apply our approach to the problem of retail fraud detection. Experimental results on a large set of video data captured from retail stores show that our approach, while much simpler and faster, achieves significantly better results than state-of-the-art machine learning-based techniques both in detecting checkout-related activities and in detecting checkoutrelated fraudulent incidents.