In this paper, we propose a novel Spatiotemporal Interest Point (MC-STIP) detector based on the coherent motion pattern around each voxel in videos. Our detector defines the local peaks of optical flow as the interest points in the motion coherence volumes of videos. A concatenating histogram of 2D gradients is introduced to describe each interest point as the descriptor. Moreover, we introduce a Topic Matrix Video Representation (T-Mat) for videos. Our representation not only captures the global hidden topics but also preserves the shared discriminative information among the interest point descriptors. We conduct our experiments on three benchmark datasets to recognize human actions using Support Vector Machines with four different kernels. The experiments demonstrate the effectiveness of our new approach.