Abstract. Colonoscopy is an important screening procedure for colorectal cancer. During this procedure, the endoscopist visually inspects the colon. Currently, there is no content-based analysis and retrieval system that automatically analyzes videos captured from colonoscopic procedures and provides a user-friendly and efficient access to important content. Such a system will be valuable as an educational resource for endoscopic research, a platform to assess procedural skills for endoscopists, and a platform for mining for unknown abnormality patterns that may lead to colorectal cancer. The first necessary step for the analysis is parsing for semantic units. In this paper, we propose a new visual model approach that employs visual features extracted directly from compressed videos together with audio analysis to discover important semantic units called scenes. Our experimental results show average precision and recall of 93% and 85%, respectively.