One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved – persons, locations, organizations – are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation whi...
Desislava Petkova, W. Bruce Croft