Towards Structure-sensitive Hypertext Categorization

16 years 10 days ago

Download elara.tk.informatik.tu-darmstadt.de

Abstract. Hypertext categorization is the task of automatically assigning category labels to hypertext units. Comparable to text categorization it stays in the area of function learning based on the bag-of-features approach. This scenario faces the problem of a many-to-many relation between websites and their hidden logical document structure. The paper argues that this relation is a prevalent characteristic which interferes any eﬀort of applying the classical apparatus of categorization to web genres. This is conﬁrmed by a threefold experiment in hypertext categorization. In order to outline a solution to this problem, the paper sketches an alternative method of unsupervised learning which aims at bridging the gap between statistical and structural pattern recognition (Bunke et al. 2001) in the area of web mining.

Alexander Mehler, Rüdiger Gleim, Matthias Deh

Real-time Traffic

GFKL 2005 | Hypertext Categorization | Logical Document Structure | Many-to-many Relation |

claim paper

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	GFKL
Authors	Alexander Mehler, Rüdiger Gleim, Matthias Dehmer

Sciweavers

Towards Structure-sensitive Hypertext Categorization

GFKL 2005 | Hypertext Categorization | Logical Document Structure | Many-to-many Relation |

Explore & Download

Productivity Tools

Sciweavers