In this paper, we present a method that recognizes single or multiple common actions between a pair of video sequences. We establish an energy function that evaluates geometric and photometric consistency, and solve the action recognition problem by optimizing the energy function. The proposed stochastic inference algorithm based on the Monte Carlo method explores the video pair from the local spatiotemporal interest point matches to find the common actions. Our algorithm works in unsupervised way without prior knowledge about the type and the number of common actions. Experiments show that our algorithm produces promising results on single and multiple action recognition.