We develop a robust technique to find similar matches of human actions in video. Given a query video, Motion History Images (MHI) are constructed for consecutive keyframes. This is followed by dividing the MHI into local MotionShape regions, which allows us to analyze the action as a set of sparse space-time patches in 3D. Inspired by the idea of Generalized Hough Transform, we develop the Implicit Motion-Shape Model that allows the integration of these local patches to describe the dynamic characteristics of the query action. In the same way we retrieve motion segments from video candidates, then project them onto the Hough Space built by the query model. This produces the matching score by running Parzen window density estimation under different scales. Empirical experiments on popular datasets demonstrate the efficiency of this approach, where highly accurate matches are returned within acceptable processing time.