Early TREC-style Question Answering Systems were characterized by the following features: (a) the answer of the question was known to be included in a given local corpus, (b) the size of the small corpus permitted preprocessing, including named entity extraction and parsing of all documents, and (c) the corpus consisted of well-written news documents. More recently, QA Systems have started to use the Web as a corpus, either by extracting answers from the Web rather than a local corpus or by learning lexical patterns from the Web which are then used to improve the system itself. Using the Web for Question Answering presents an interesting combination of opportunities and challenges. This panel will discuss how to leverage the opportunities while addressing the inevitable challenges. The Web as a repository of answers Scaling QA systems to the Web presents an extraordinary challenge. The collections used in TREC8-10 contained a couple of hundred thousand documents while Google indexes m...
Dragomir R. Radev