New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline. Categories and Subject Descriptors H.3.0 [Information Search and Retrieval]: General General Terms Algorithms, Measurement, Experimentation Keywords Information Retrieval, Language Models, Document Quality, Maximum Entropy
Jiwoon Jeon, W. Bruce Croft, Joon Ho Lee, Soyeon P