Stemming Indonesian

15 years 7 months ago

Download www.acs.org.au

Stemming words to (usually) remove suﬃxes has applications in text search, machine translation, document summarisation, and text classiﬁcation. For example, English stemming reduces the words “computer”, “computing”, “computation”, and “computability” to their common morphological root, “comput-”. In text search, this permits a search for “computers” to ﬁnd documents containing all words with the stem “comput-”. In the Indonesian language, stemming is of crucial importance: words have preﬁxes, suﬃxes, inﬁxes, and conﬁxes that make matching related words diﬃcult. In this paper, we investigate the performance of ﬁve Indonesian stemming algorithms through a user study. Our results show that, with the availability of a reasonable dictionary, the unpublished algorithm of Nazief and Adriani correctly stems around 93% of word occurrences to the correct root word. With the improvements we propose, this almost reaches 95%. We conclude that stemming...

Jelita Asian, Hugh E. Williams, Seyed M. M. Tahagh

Real-time Traffic

ACSC 2005 | Common Morphological Root | Computer Science | Text Search | ﬁve Indonesian Stemming |

claim paper

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	ACSC
Authors	Jelita Asian, Hugh E. Williams, Seyed M. M. Tahaghoghi

Sciweavers

Stemming Indonesian

ACSC 2005 | Common Morphological Root | Computer Science | Text Search | ﬁve Indonesian Stemming |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers