Sciweavers

2677 search results - page 41 / 536
» Extracting Structured Data from Web Pages
Sort
View
ICWE
2009
Springer
14 years 3 months ago
A Layout-Independent Web News Article Contents Extraction Method Based on Relevance Analysis
Abstract. The traditional Web news article contents extraction methods are time-costly and need much maintenance because they analyze the layout of news pages to generate the wrapp...
Hao Han, Takehiro Tokuda
WWW
2008
ACM
14 years 9 months ago
Web page sectioning using regex-based template
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
Rupesh R. Mehta, Amit Madaan
WWW
2009
ACM
14 years 3 months ago
News article extraction with template-independent wrapper
We consider the problem of template-independent news extraction. The state-of-the-art news extraction method is based on template-level wrapper induction, which has two serious li...
Junfeng Wang, Xiaofei He, Can Wang, Jian Pei, Jiaj...
ICANN
2005
Springer
14 years 2 months ago
Content-Based Retrieval of Web Pages and Other Hierarchical Objects with Self-organizing Maps
We propose a content-based information retrieval (CBIR) method that models known relationships between multimedia objects as a hierarchical tree-structure incorporating additional ...
Mats Sjöberg, Jorma Laaksonen
CIKM
2008
Springer
13 years 10 months ago
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis