Importance of HTML Structural Elements and Metadata in Automated Subject Classification

14 years 8 months ago

Download www.it.lth.se

The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.

Koraljka Golub, Anders Ardö

Real-time Traffic

ERCIMDL 2005 | Influence Automated Classification | Significance Indicators | Web Pages |

claim paper

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	ERCIMDL
Authors	Koraljka Golub, Anders Ardö

Comments (0)

Sciweavers

Importance of HTML Structural Elements and Metadata in Automated Subject Classification

ERCIMDL 2005 | Influence Automated Classification | Significance Indicators | Web Pages |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers