This paper proposes a novel algorithm for categorization of action video sequences using unsupervised dual clustering. Given a video database, we extract motion information of actions and perform nonlinear dimensionality reduction for addressing both the high dimensionality of silhouette features and non-linearity of articulated human actions. A k-means clustering is first performed on frame-wise features in the embedding space to convert each video in the database to a sequence of labels, each of which corresponds to one of k “key” feature frames. The dissimilarity between any two label sequences is then measured using edit distance. The resulting pairwise dissimilarity matrix is finally input to a spectral clustering algorithm to obtain the category labels of each action video. Experimental results on two recent data sets demonstrate the effectiveness and efficiency of the proposed algorithm.