Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

201

WWW
2005
ACM

173views Internet Technology» more WWW 2005»

Extracting semantic structure of web documents using content and visual information

16 years 7 months ago

Extracting semantic structure of web documents using content and visual information

Download www2005.org

This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is utilized using the VIPS algorithm and the content information using a pre-trained Naive Bayes classifier. The output of the algorithm is a semantic structure tree whose leaves represent segments having unique topic. However contents of the leaf segments may possibly be physically distributed in the web page. This structure can be useful in many web applications like information retrieval, information extraction and automatic web page adaptation. This algorithm is expected to outperform other existing page segmentation algorithms since it utilizes both content and visual information. Categories and Subject Descriptors: H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia General Terms: Algorithms, Design.

Rupesh R. Mehta, Pabitra Mitra, Harish Karnick

Real-time Traffic

Internet Technology | Page Segmentation Algorithm | Page Segmentation Algorithms | VIPS Algorithm | WWW 2005 |

claim paper

Related Content

» Extracting and Modeling the Semantic Information Content of Web Documents to Support Seman...

» Extracting Content Structure for Web Pages Based on Visual Representation

» Narrowing the semantic gap improved textbased web document retrieval using visual feature...

» Document Visualization on Small Displays

» Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts

» Web document text and images extraction using DOM analysis and natural language processing

» Logical structure based semantic relationship extraction from semistructured documents

» Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Informatio...

» Automatic extraction of clickable structured web contents for name entity queries

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2005
Where	WWW
Authors	Rupesh R. Mehta, Pabitra Mitra, Harish Karnick

Comments (0)