We review a memory-based buffer model of visual perception, that combines the lower and middle stages in the analysis of video. This model was originally developed for the detection of breaks between physical scene changes ("story units"). In this paper we show how this method can also be applied for shot detection. Moreover, it gives rise to a unified approach for detecting cuts and gradual changes. Additionally, as a straightforward corollary to its definition, it leads to a more natural definition of key frames. We derive the theoretic performance of the model, given arbitrary measures of frame-to-frame dissimilarity. We show several examples of its response, using standard color histogram differencing (with the ??? norm as measure). We evaluate the model's performance in detecting cuts and dissolves against a complete hand-segmented situation comedy.
Aya Aner, John R. Kender