Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users' queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from databases. The proposed system, 2PS, is based on a two-phase framework for the sampling, extraction and summarisation of Hidden Web documents. In the first phase, 2PS queries databases with random terms selected from those contained in their search interface pages and the subsequently retrieved documents
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M