We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
The proliferation of electronic content has notably lead to the apparition of large corpora of interrelated structured documents (such as HTML and XML Web pages) and semantic annot...
This paper investigates an application of the ranked region algebra to information retrieval from large scale but unannotated documents. We automatically annotated documents with ...
There exist two types of wrappers: the string based wrapper such as the LR wrapper, and the tree based wrapper. A tree based wrapper designates extraction regions by nodes on the ...
The exponential data growth rate of the Internet makes it increasingly difficult for people to find desired information in a timely fashion. Information filtering and dissemina...