We present a new method for information retrievalusing hidden Markov models (HMMs). We develop a general framework for incorporating multiple word generation mechanisms within the same model. We then demonstrate that an extremely simple realization of this model substantially outperforms standard tf :idf ranking on both the TREC-6 and TREC-7 ad hoc retrieval tasks. We go on to present a novel method for performing blind feedback in the HMM framework, a more complex HMM that models bigram production, and several other algorithmic re nements. Together, these methods form a state-of-the-art retrieval system that ranked among the best on the TREC-7 ad hoc retrieval task.
David R. H. Miller, Tim Leek, Richard M. Schwartz