Portable Extraction of Partially Structured Facts from the Web

15 years 5 months ago

Download tripod.shef.ac.uk

A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (BuildingLocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, the partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and ...

Andrew Salway, Liadh Kelly, Inguna Skadina, Gareth

Real-time Traffic

Current Information Retrieval | Information Extraction Technologies | Natural Language Processing | TAL 2010 | Template-based Text Generation |

claim paper

» Corroborate and learn facts from the web

» FACTO a fact lookup engine based on web tables

» Existentially Quantified Values for Queries and Updates of Facts in Transaction Logic Prog...

» Automatic Wrapper Generation Using Tree Matching and Partial Tree Alignment

» Web data extraction based on partial tree alignment

» Question Answering over Implicitly Structured Web Content

» Wikipedia Link Structure and Text Mining for Semantic Relation Extraction

» From HTML documents to web tables and rules

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	TAL
Authors	Andrew Salway, Liadh Kelly, Inguna Skadina, Gareth J. F. Jones

Comments (0)

Sciweavers

Portable Extraction of Partially Structured Facts from the Web

Current Information Retrieval | Information Extraction Technologies | Natural Language Processing | TAL 2010 | Template-based Text Generation |

Explore & Download

Productivity Tools

Sciweavers