Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document content, with only a single version of the document in the index at any point in time. In this paper, we present the first published analysis of using the temporal dynamics of document content to improve relevance ranking. We show that there is a strong relationship between the amount and frequency of content change and relevance. We develop a novel probabilistic document ranking algorithm that allows differential weighting of terms based on their temporal characteristics. By leveraging such content dynamics we show significant performance improvements for navigational queries. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Miscellaneous General Terms Algorithms, Experimentation Keywords Web search, versioned documents, temporal change
Jonathan L. Elsas, Susan T. Dumais