Whereas most existing action recognition methods require computationally demanding feature extraction and/or classification, this paper presents a novel real-time solution that utilises local appearance and structural information. Semantic texton forests (STFs) are applied to local space-time volumes as a powerful discriminative codebook. Since STFs act directly on video pixels without using expensive descriptors, visual codeword generation by STFs is extremely fast. To capture the structural information of actions, so called pyramidal spatiotemporal relationship match (PSRM) is introduced. Leveraging the hierarchical structure of STFs, the pyramid match kernel is applied to obtain robust structural matching, avoiding quantisation effects. We propose the kernel k-means forest classifier using PSRM to perform classification. In the experiments using KTH and the latest UT-interaction data sets, we demonstrate real-time performance as well as state-ofthe-art accuracy by the proposed meth...