Score distribution models: assumptions, intuition, and robustness to score manipulation

15 years 6 months ago

Download kanoulas.staff.shef.ac.uk

Inferring the score distribution of relevant and non-relevant documents is an essential task for many IR applications (e.g. information filtering, recall-oriented IR, meta-search, distributed IR). Modeling score distributions in an accurate manner is the basis of any inference. Thus, numerous score distribution models have been proposed in the literature. Most of the models were proposed on the basis of empirical evidence and goodness-of-fit. In this work, we model score distributions in a rather different, systematic manner. We start with a basic assumption on the distribution of terms in a document. Following the transformations applied on term frequencies by two basic ranking functions, BM25 and Language Models, we derive the distribution of the produced scores for all documents. Then we focus on the relevant documents. We detach our analysis from particular ranking functions. Instead, we consider a model for precision-recall curves, and given this model, we present a general mathe...

Evangelos Kanoulas, Keshi Dai, Virgiliu Pavlu, Jav

Real-time Traffic

Documents | Information Technology | Relevant Documents | Score Distribution | SIGIR 2010 |

claim paper

Added	06 Dec 2010
Updated	06 Dec 2010
Type	Conference
Year	2010
Where	SIGIR
Authors	Evangelos Kanoulas, Keshi Dai, Virgiliu Pavlu, Javed A. Aslam

Sciweavers

Score distribution models: assumptions, intuition, and robustness to score manipulation

Documents | Information Technology | Relevant Documents | Score Distribution | SIGIR 2010 |

Explore & Download

Productivity Tools

Sciweavers