Determining automatically what constitutes a scene in a video is a challenging task, particularly since there is no precise definition of the term "scene". It is left to the individual to set attributes shared by consecutive shot which group them into scenes. Certain basic attributes such as dialogs, like settings and continuing sounds are consistent indicators. We have therefore developed a scheme for identifying scenes which clusters shots according to detected dialogs, like settings and similar audio. Results from experiments show automatic identification of these types of scenes to be reliable.