Feature Selection for the Classification of Large Document Collections

15 years 6 months ago

Download www.jucs.org

: Feature selection methods are often applied in the context of document classification. They are particularly important for processing large data sets that may contain millions of documents and are typically represented by a large number, possibly tens of thousands of features. Processing large data sets thus raises the issue of computational resources and we often have to find the right trade-off between the size of the feature set and the number of training data that we can taken into account. Furthermore, depending on the selected classification technique, different feature selection methods require different optimization approaches, raising the issue of compatibility between the two. We demonstrate an effective classifier training and feature selection method that is suitable for large data collections. We explore feature selection based on the weights obtained from linear classifiers themselves, trained on a subset of training documents. While most feature weighting schemes score...

Janez Brank, Dunja Mladenic, Marko Grobelnik, Nata

Real-time Traffic

Documents | Feature Selection | Feature Selection Method | JUCS 2008 |

claim paper

» Feature Generation Feature Selection Classifiers and Conceptual Drift for Biomedical Docum...

» A robust front page detection algorithm for large periodical collections

» Scalable associationbased text classification

» Fast dimension reduction for document classification based on imprecise spectrum analysis

» Conditional Feature Sensitivity A Unifying View on Active Recognition and Feature Selectio...

» A NeuralEvolutionary Approach for Feature and Architecture Selection in Online Handwriting...

» FaceTracer A Search Engine for Large Collections of Images with Faces

» Scalable Classification in Large Scale Spatiotemporal Domains Applied to VoltageSensitive ...

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2008
Where	JUCS
Authors	Janez Brank, Dunja Mladenic, Marko Grobelnik, Natasa Milic-Frayling

Comments (0)

Sciweavers

Feature Selection for the Classification of Large Document Collections

Documents | Feature Selection | Feature Selection Method | JUCS 2008 |

Explore & Download

Productivity Tools

Sciweavers