Sciweavers

543 search results - page 11 / 109
» Exploiting content redundancy for web information extraction
Sort
View
JUCS
2008
185views more  JUCS 2008»
13 years 7 months ago
Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction
Abstract: As web sites are getting more complicated, the construction of web information extraction systems becomes more troublesome and time-consuming. A common theme is the diffi...
Jinbeom Kang, Joongmin Choi
WIRI
2005
IEEE
14 years 1 months ago
Extended Link Analysis for Extracting Spatial Information Hubs
Recently, web mining that tries to find useful knowledge from the vast amount of web pages has attracted a lot of research interests. Besides, it is becoming an essential task to...
Jianwei Zhang 0002, Yoshiharu Ishikawa, Hiroyuki K...
WWW
2005
ACM
14 years 1 months ago
An information extraction engine for web discussion forums
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
WWW
2005
ACM
14 years 8 months ago
Hybrid semantic tagging for information extraction
The semantic web is expected to have an impact at least as big as that of the existing HTML based web, if not greater. However, the challenge lays in creating this semantic web an...
Ronen Feldman, Binyamin Rosenfeld, Moshe Fresko, B...
AIRWEB
2008
Springer
13 years 9 months ago
Web spam identification through content and hyperlinks
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as we...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo