We present a new method for segmenting actions into primitives and classifying them into a hierarchy of action classes. Our scheme learns action classes in an unsupervised manner using examples recorded by multiple cameras. Segmentation and clustering of action classes is based on a recently proposed motion descriptor which can be extracted efficiently from reconstructed volume sequences. Because our representation is independent of viewpoint, it results in segmentation and classification methods which are surprisingly efficient and robust. Our new method can be used as the first step in a semi-supervised action recognition system that will automatically break down training examples of people performing sequences of actions into primitive actions that can be discriminatingly classified and assembled into high-level recognizers.