The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the Cluster Hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other, and therefore tend to appear in the same clusters. In this paper we propose that, for any given query, pairs of relevant documents will exhibit an inherent similarity which is dictated by the query itself. Our research describes an attempt to devise means by which this similarity can be detected. We propose the use of query-sensitive similarity measures that bias interdocument relationships towards pairs of documents that jointly possess attributes that are expressed in a query. We experimentally tested query-sensitive measures against conventional ones that do not take the context of the query into account. We calculated interdocument relationships for varying numbers of top-ranked documents for five document collections. Our results show a con...
Anastasios Tombros, C. J. van Rijsbergen