Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

206

ISIWI
2000

274views Knowledge Management» more ISIWI 2000»

Automatic Document Classification - A thorough Evaluation of various Methods

15 years 8 months ago

Automatic Document Classification - A thorough Evaluation of various Methods

Download www.informationswissenschaft.org

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods.

Christoph Goller, J. Löning, T. Will, W. Wolf

Real-time Traffic

Document Classification | Document Classification Task | ISIWI 2000 | ISIWI 2004 | Support Vector Machines |

claim paper

Related Content

» Graph bColoring for Automatic Recognition of Documents

» Automatic classification in product catalogs

» BayesTHMCRDR Algorithm for Automatic Classification of Web Document

» A phonotacticsemantic paradigm for automatic spoken document classification

» The Choice of Features for Classification of Verbs in Biomedical Texts

» Automatic Evaluation of Linguistic Quality in MultiDocument Summarization

» Ontology Evaluation through Text Classification

» Automatic Expansion of Manual Email Classifications Based on Text Analysis

» Using WordNet to Disambiguate Word Senses for Text Classification

Post Info
More Details (n/a)

Added	01 Nov 2010
Updated	01 Nov 2010
Type	Conference
Year	2000
Where	ISIWI
Authors	Christoph Goller, J. Löning, T. Will, W. Wolff

Comments (0)