We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database...
Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web sea...
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequ...
We present a general framework for the task of extracting specific information “on demand” from a large corpus such as the Web under resource-constraints. Given a database wit...