In this paper we discuss our recent research and open issues in structural and semantic analysis of digital videos. Specifically, we focus on segmentation, summarization and classification of digital video. In each area, we also emphasize the importance of understanding domain-specific characteristics. In scene segmentation, we introduce the idea of a computable scene as a chunk of audio-visual data that exhibits long-term consistency with regard to several audio-visual properties. In summarization, we discuss shot and program level summaries. We describe classification schemes based on Bayesian networks, which model interaction of multiple classes at different levels using multi-media. We also discuss classification techniques that exploit domain-specific spatial structural constraints as well as temporal transitional models.