Sciweavers

2677 search results - page 37 / 536
» Extracting Structured Data from Web Pages
Sort
View
CIKM
2008
Springer
13 years 10 months ago
Coreex: content extraction from online news articles
We developed and tested a heuristic technique for extracting the main article from news site Web pages. We construct the DOM tree of the page and score every node based on the amo...
Jyotika Prasad, Andreas Paepcke
LPNMR
2001
Springer
14 years 1 months ago
Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting informatio...
Robert Baumgartner, Sergio Flesca, Georg Gottlob
PKDD
2004
Springer
205views Data Mining» more  PKDD 2004»
14 years 2 months ago
Breaking Through the Syntax Barrier: Searching with Entities and Relations
The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search sy...
Soumen Chakrabarti
WWW
2004
ACM
14 years 9 months ago
Learning block importance models for web pages
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
JUCS
2008
185views more  JUCS 2008»
13 years 8 months ago
Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...
Jinbeom Kang, Joongmin Choi