This paper deals with the problem of structuralizing education and training videos for high-level semantics extraction and nonlinear media presentation in e-learning applications. Drawing guidance from production knowledge in instructional media, we propose six main narrative structures employed in education and training videos for both motivation and demonstration during learning and practical training. We devise a powerful audiovisual feature set, accompanied by a hierarchical decision tree-based classification system to determine and discriminate between these structures. Based on a two-tiered hierarchical model, we demonstrate that we can achieve an accuracy of 84.7% on a comprehensive set of education and training video data.