We present a fully automatic approach for facial expression recognition based on a representation of facial motion using a vocabulary of local motion descriptors. Previous studies have shown that motion is sufficient for recognizing expressions. Moreover, by discarding appearance after optical flow estimation, our representation is invariant to the subjects' ethnic background, facial hair and other confounders. Unlike most facial expression recognition approaches, ours is general and not specifically tailored to faces. Annotation efforts for training are minimal, since the user does not have to label frames according to the phase of the expression, or identify facial features. Only a single expression label per sequence is required. We show results on a database of 600 video sequences.