Sciweavers

543 search results - page 14 / 109
» Exploiting content redundancy for web information extraction
Sort
View
CIKM
2008
Springer
13 years 9 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
LREC
2008
110views Education» more  LREC 2008»
13 years 9 months ago
Unsupervised and Domain Independent Ontology Learning: Combining Heterogeneous Sources of Evidence
Acquiring knowledge from the Web to build domain ontologies has become a common practice in the Ontological Engineering field. The vast amount of freely available information allo...
David Manzano-Macho, Asunción Gómez-...
WWW
2007
ACM
14 years 8 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
UCS
2007
Springer
14 years 1 months ago
DroPicks - A Tool for Collaborative Content Sharing Exploiting Everyday Artefacts
Emergence of social web services like YouTube[1], Flickr[2] etc. is constantly transforming the way we share our lifestyles with family, friends and colleagues. The significance of...
Simo Hosio, Fahim Kawsar, Jukka Riekki, Tatsuo Nak...