Neighbourhood Exploitation in Hypertext Categorization

15 years 12 months ago

Download www.maxbramer.org.uk

As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into predefined classes, came to elevate humans from that task. The extra information available in a hypertext document poses new challenges for automatic categorization. HTML tags and linked neighbourhood all provide rich information for hypertext categorization that is not available in traditional text classification. This paper looks at (i) which extra information hidden in HTML tags and linked neighbourhood pages to take into consideration to improve the classification task, and (ii) how to deal with the high level of noise in linked pages. A hypertext dataset and four well-known learning algorithms (Naïve Bayes, KNearest Neighbour, Support Vector Machine and C4.5) were used to exploit the enriched text representation. The results showed that the clever use of the information in linked neighbourhood and HTML ...

Houda Benbrahim, Max Bramer

Real-time Traffic

Extra Information | HTML Tags | Hypertext Categorization | SGAI 2004 |

claim paper

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	SGAI
Authors	Houda Benbrahim, Max Bramer

Comments (0)

Sciweavers

Neighbourhood Exploitation in Hypertext Categorization

Extra Information | HTML Tags | Hypertext Categorization | SGAI 2004 |

Explore & Download

Productivity Tools

Sciweavers