We present a framework for unsupervised detection of nonverbal behavioral cues—hand gestures, pose, body movements, etc.—from a collection of motion capture (MoCap) sequences in a public speaking setting. We extract the cues by solving a sparse and shift-invariant dictionary learning problem, known as shift-invariant sparse coding. We find that the extracted behavioral cues are human-interpretable in the context of public speaking. Our technique can be applied to automatically identify the common patterns of body movements and the time-instances of their occurrences, minimizing time and efforts needed for manual detection and coding of nonverbal human behaviors. Categories and Subject Descriptors G.1 [NUMERICAL ANALYSIS]: Optimization; J.5 [ARTS AND HUMANITIES]: Performing arts Keywords Public Speaking; Action Recognition; Unsupervised Analysis; Sparsity; Shift-Invariant Sparse Coding