Supervised Term Weighting for Automated Text Categorization

14 years 10 months ago

Download reference.kfupm.edu.sa

The construction of a text classiﬁer usually involves (i) a phase of term selection, in which the most relevant terms for the classiﬁcation task are identiﬁed, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classiﬁer learning, in which a classiﬁer is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from training data should also aﬀect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example, we propose a number of “supervised variants” of tfidf weighting, obtained by replacing the idf function with...

Franca Debole, Fabrizio Sebastiani

Real-time Traffic