Stemming Approaches for East European Languages

16 years 24 days ago

Download www.clef-campaign.org

In our participation in this CLEF evaluation campaign, the first objective is to propose and evaluate various indexing and search strategies for the Czech language in order to hopefully produce better retrieval effectiveness than that of the language-independent approach (n-gram). Based on our stemming strategy used with other languages, we propose two light stemmers for this Slavic language and a third one based on a more aggressive suffix-stripping scheme that removes some derivational suffixes. Our second objective is to obtain a better picture of the relative merit of various search engines in exploring Hungarian and Bulgarian documents. Moreover for the Bulgarian language we developed a new and more aggressive stemmer. To evaluate these solutions we use our various IR models, including the Okapi, Divergence from Randomness (DFR) and statistical language model (LM) together with the classical tf.idf vectorprocessing approach. Our experiments tend to show that for the Bulgarian lan...

Ljiljana Dolamic, Jacques Savoy

Real-time Traffic

Bulgarian Language | CLEF 2007 | Czech Language | Derivational Suffixes | Information Retrieval |

claim paper

» MorphoSaurus in ImageCLEF 2006 The Effect of Subwords On Biomedical IR

» DomainSpecific IR for German English and Russian Languages

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	CLEF
Authors	Ljiljana Dolamic, Jacques Savoy

Comments (0)

Sciweavers

Stemming Approaches for East European Languages

Bulgarian Language | CLEF 2007 | Czech Language | Derivational Suffixes | Information Retrieval |

Explore & Download

Productivity Tools

Sciweavers