Retrieving answers from frequently asked questions pages on the web

16 years 2 days ago

Download staff.science.uva.nl

We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ pages from the web; (2) automatic extraction of question/answer (Q/A) pairs from the collected pages; and (3) answering users’ questions by retrieving appropriate Q/A pairs. We discuss our solutions for each of the three tasks, and give detailed evaluation results on a collected corpus of about 3.6Gb of text data (293K pages, 2.8M Q/A pairs), with real users’ questions sampled from a web search engine log. Speciﬁcally, we propose simple but eﬀective methods for Q/A extraction and investigate task-speciﬁc retrieval models for answering questions. Our best model ﬁnds answers for 36% of the test questions in the top 20 results. Our overall conclusion is that FAQ pages on the web provide an excellent resource for addressing real users’ information needs in a highly focused manner....

Valentin Jijkoun, Maarten de Rijke

Real-time Traffic