Sciweavers

WEBI
2005
Springer

Automated Metadata and Instance Extraction from News Web Sites

14 years 4 months ago
Automated Metadata and Instance Extraction from News Web Sites
In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. We present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We report experimental evaluation for the news domain to demonstrate the efficacy of our algorithms.
Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih G
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where WEBI
Authors Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih Gelgi, Hasan Davulcu
Comments (0)