Impact on Performance of Hypertext Classification of Selective Rich HTML Capture

15 years 8 months ago

Download www.maxbramer.org.uk

: Hypertext categorization is the automatic classification of web documents into predefined classes. It poses new challenges for automatic categorization because of the rich information in a hypertext document. Hyperlinks, HTML tags, and metadata all provide rich information for hypertext categorization that is not available in traditional text classification. This paper looks at (i) what representation to use for documents and which extra information hidden in HTML pages to take into consideration to improve the classification task, and (ii) how to deal with the very high number of features of texts. A hypertext dataset and three well-known learning algorithms (Na

Houda Benbrahim, Max Bramer

Real-time Traffic

Hypertext | Hypertext Categorization | IFIP12 2004 | IFIP12 2007 | Rich Information |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	IFIP12
Authors	Houda Benbrahim, Max Bramer

Comments (0)

Sciweavers

Impact on Performance of Hypertext Classification of Selective Rich HTML Capture

Hypertext | Hypertext Categorization | IFIP12 2004 | IFIP12 2007 | Rich Information |

Explore & Download

Productivity Tools

Sciweavers