A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract usef...
Andrew Salway, Liadh Kelly, Inguna Skadina, Gareth...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Thereis a wealthof informationto be minedfromnarrative text on the WorldWideWeb.Unfortunately, standard natural language processing (NLP)extraction techniques expect full, grammat...
In this paper, we propose a new system extracting potentially copyright infringement texts from the Web, called EPCI. EPCI extracts them in the following way: (1) generating a set...
Takashi Tashiro, Takanori Ueda, Taisuke Hori, Yu H...
This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods stil...