How do people make sense of a video based on viewing a few frames of that video? What elements constitute the "visual gist" in their minds? Answers to these questions will give implications to both content-based video retrieval and the interface design (e.g., key-frame selection) of digital video libraries. A preliminary study was conducted to unravel the issues and 45 subjects participated in the study. After viewing a fast forward surrogate, the subjects were asked to choose pictures which they thought would "belong to" the video. And they were also asked to think aloud during their selection processes. Nine visual gist attributes (e.g., people, objects and actions) were generated using the grounded theory method and their frequencies were also compared and analyzed. Author Keywords Visual gist understanding, video retrieval, video surrogate, fast forward, user studies. ACM Classification Keywords H5.m. Information interfaces and presentation (e.g., HCI): Miscell...