When investigating alternative estimates for term discriminativeness, we discovered that relevance information and idf are much closer related than formulated in classical literature. Therefore, we revisited the justification of idf as it follows from the binary independent retrieval (BIR) model. The main result is a formal framework uncovering the close relationship of a generalised idf and the BIR model. The framework makes explicit how to incorporate relevance information into any retrieval function that involves an idf component. In addition to the idf -based formulation of the BIR model, we propose Poisson-based estimates as an alternative to the classical estimates, this being motivated by the superiority of Poisson-based estimates for the within-document term frequencies. The main experimental finding is that a Poisson-based idf is superior to the classical idf , where the superiority is particularly evident for long queries. Categories and Subject Descriptors: H.3.3 [Informa...
Arjen P. de Vries, Thomas Rölleke