Sciweavers

AIPRF
2007

Evaluation of Different Approaches to Training a Genre Classifier

14 years 1 months ago
Evaluation of Different Approaches to Training a Genre Classifier
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, three machine learning algorithms, one for induction of decision trees (J48) and two ensemble algorithms (bagging and boosting), were trained and tested on the data set. Additionally, impact of feature selection on ensemble algorithms was tested. The best performed genre classifiers in terms of precision were selected to obtain the best of set of classifiers. On average the best of set achieved 9% better precision, but slightly worse recall. Accuracy and F-measure did not vary significantly. The results indicate that classification by genre could be a useful addition to search engines.
Vedrana Vidulin, Mitja Lustrek, Matjaz Gams
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where AIPRF
Authors Vedrana Vidulin, Mitja Lustrek, Matjaz Gams
Comments (0)