Data fusion has been investigated by many researchers in the information retrieval community and has become an effective technique for improving retrieval effectiveness. In this paper we investigate how to model rank-probability of relevance relationship in resultant document list for data fusion since reliable relevance scores are very often unavailable for component results. We apply statistical regression technique in our investigation. Different regression models are tried and two good models, which are cubic and logistic models, are selected from a group of candidates. Experiments with 3 groups of results submitted to TREC are carried out and experimental results demonstrate that the cubic and logistic models work better than the linear model and are as good as those methods which use scoring information.
Shengli Wu, Yaxin Bi, Sally I. McClean