Flint: Google-basing the Web

16 years 6 months ago

Download www.edbt.org

Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Even though it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. We have developed a system, called FLINT, for automatically searching, collecting and indexing Web pages that publish data representing an instance of a certain conceptual entity. FLINT takes as input a small set of labeled sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. FLINT automatically extracts data from the collected pages and stores them into a semi-structured self-describing database, such as Google Base. Also, the co...

Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo,

Real-time Traffic

Certain Conceptual Entity | Database | EDBT 2008 | Pages Containing Data | Real World Entity |

claim paper

» HDSampler revealing data behind web form interfaces

» SourceRank relevance and trust assessment for deep web sources based on intersource agreem...

» Structured Data on the Web

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2008
Where	EDBT
Authors	Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti

Comments (0)

Sciweavers

Flint: Google-basing the Web

Certain Conceptual Entity | Database | EDBT 2008 | Pages Containing Data | Real World Entity |

Explore & Download

Productivity Tools

Sciweavers