We present a new method for information retrieval using hidden Markov models HMMs and relate our experience with this system on the TREC-7 ad hoc task. We develop a general framework for incorporating multiple word generation mechanisms within the same model. We then demonstrate that an extremely simple realization of this model substantially outperforms tf:idf rankingon both the TREC-6and TREC7 ad hoc retrieval tasks. We go on to present several algorithmic re nements, including a novel method for performing blind feedback in the HMM framework. Together, these methods form a state-of-the-art retrieval system that ranked among the best on the TREC-7 ad hoc retrieval task, and showed extraordinary performance in development experiments on TREC-6.
David R. H. Miller, Tim Leek, Richard M. Schwartz