Feature Generation, Feature Selection, Classifiers, and Conceptual Drift for Biomedical Document Triage

14 years 4 months ago

Download trec.nist.gov

We approached the problem of classifying papers for the TREC 2004 Genomics Track triage task as a four step process: feature generation, feature selection, classifier training, and finally, classification. Section specific binary features that discriminated significantly between positive and negative training samples were chosen using the Chisquare statistic. Three classifiers were trained on this feature set: a simple Naive Bayes classifier, the SVMLight support vector machine implementation, and a voting perceptron extended to support variable learning rates. Comparing the classifiers on the training data we found that neither Naive Bayes nor SVMLight was able to adequately account for the factor of 20 in the utility function. The voting perceptron classifier performed much better at this. The performance on the test collection was lower for all classifiers, although consistent with the relative values of the training cross-validation. Feature subsetting showed no significant differ...

Aaron M. Cohen, Ravi Teja Bhupatiraju, William R.

Real-time Traffic

Feature Sets | Naive Bayes | TREC 2004 | TREC 2008 | Voting Perceptron |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	TREC
Authors	Aaron M. Cohen, Ravi Teja Bhupatiraju, William R. Hersh

Comments (0)

Sciweavers

Feature Generation, Feature Selection, Classifiers, and Conceptual Drift for Biomedical Document Triage

Feature Sets | Naive Bayes | TREC 2004 | TREC 2008 | Voting Perceptron |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers