Sciweavers

TREC
2004

Feature Generation, Feature Selection, Classifiers, and Conceptual Drift for Biomedical Document Triage

14 years 1 months ago
Feature Generation, Feature Selection, Classifiers, and Conceptual Drift for Biomedical Document Triage
We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, and finally, classification. Section specific binary features that discriminated significantly between positive and negative training samples were chosen using the Chisquare statistic. Three classifiers were trained on this feature set: a simple Naive Bayes classifier, the SVMLight support vector machine implementation, and a voting perceptron extended to support variable learning rates. Comparing the classifiers on the training data we found that neither Naive Bayes nor SVMLight was able to adequately account for the factor of 20 in the utility function. The voting perceptron classifier performed much better at this. The performance on the test collection was lower for all classifiers, although consistent with the relative values of the training cross-validation. Feature subsetting showed no significant differ...
Aaron M. Cohen, Ravi Teja Bhupatiraju, William R.
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where TREC
Authors Aaron M. Cohen, Ravi Teja Bhupatiraju, William R. Hersh
Comments (0)