Most Web-based Q/A systems work by finding pages that contain an explicit answer to a question. These systems are helpless if the answer has to be inferred from multiple sentences, possibly on different pages. To solve this problem, we introduce the HOLMES system, which utilizes textual inference (TI) over tuples extracted from text. Whereas previous work on TI (e.g., the literature on textual entailment) has been applied to paragraph-sized texts, HOLMES utilizes knowledge-based model construction to scale TI to a corpus of 117 million Web pages. Given only a few minutes, HOLMES doubles recall for example queries in three disparate domains (geography, business, and nutrition). Importantly, HOLMES's runtime is linear in the size of its input corpus due to a surprising property of many textual relations in the Web corpus--they are "approximately" functional in a well-defined sense.
Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld