This paper presents the results of the State University of New York at Buffalo (UB) in the Mono-lingual and Multi-lingual tasks at CLEF 2004. For these tasks we used an approach based on statistical language modeling. Our Adhoc retrieval work used the TAPIR toolkit developed in house by M Srikanth. Our approach focused on the validation and adaptation of the language model system to work in a multilingual environment and in exploring ways to merge results from multiple collections into a single list of results. We explored the use of a measure of query ambiguity, also known as clarity score, for merging results of the individual collections into a single list of retrieved documents. Our results indicate that the use of clarity scores normalized across queries gives statistically significant improvements over using a fixed merging order.
Miguel E. Ruiz, Munirathnam Srikanth