Sciweavers

GFKL
2006
Springer

Putting Successor Variety Stemming to Work

14 years 3 months ago
Putting Successor Variety Stemming to Work
Stemming algorithms find canonical forms for inflected words, e. g. for declined nouns or conjugated verbs. Since such a unification of words with respect to gender, number, time, and case is a language-specific issue, stemming algorithms operationalize a set of linguistically motivated rules for the language in question. The most well-known rule-based algorithm for the English language is from Porter [14]. The paper presents a statistical stemming approach which is based on the analysis of the distribution of word prefixes in a document collection, and which thus is widely language-independent. In particular, our approach addresses the problem of index construction for multi-lingual documents. Related work for statistical stemming focuses either on stemming quality [2,3] or on runtime performance [11], but neither provides a reasonable tradeoff between both. For selected retrieval tasks under vector-based document models we report on new results related to stemming quality and collect...
Benno Stein, Martin Potthast
Added 23 Aug 2010
Updated 23 Aug 2010
Type Conference
Year 2006
Where GFKL
Authors Benno Stein, Martin Potthast
Comments (0)