Single n-gram stemming

14 years 7 months ago

Download cui.unige.ch

Stemming can improve retrieval accuracy, but stemmers are language-specific. Character n-gram tokenization achieves many of the benefits of stemming in a language independent way, but its use incurs a performance penalty. We demonstrate that selection of a single n-gram as a pseudo-stem for a word can be an effective and efficient language-neutral approach for some languages. Categories and Subject Descriptors H.3.1 [Information Systems]: Information Storage and Retrieval – content analysis and indexing. General Terms: Algorithms

James Mayfield, Paul McNamee

Real-time Traffic

Character N-gram Tokenization | Language Independent Way | N-gram | SIGIR 2003 |

claim paper

Post Info
More Details (n/a)

Added	05 Jul 2010
Updated	05 Jul 2010
Type	Conference
Year	2003
Where	SIGIR
Authors	James Mayfield, Paul McNamee

Comments (0)

Sciweavers

Single n-gram stemming

Character N-gram Tokenization | Language Independent Way | N-gram | SIGIR 2003 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers