Sciweavers

WWW
2001
ACM

Indexing the Indonesian Web: Language Identification and Miscellaneous Issues

15 years 3 days ago
Indexing the Indonesian Web: Language Identification and Miscellaneous Issues
Information retrieval tools and search engines have mainly been leveraging research results and technologies developed for the English language. In this paper we report the issues and obstacles we met in the process of designing and developing a search engine for the Indonesian language, as well as our progress and results. The results include original contributions such as a grammar for stemming Indonesian words and a selfimproving language identification algorithm. Keywords Indonesian Language, search engine, web-crawler, stemming language identification, supervised learning, unsupervised learning.
Stéphane Bressan, Vinsensius Berlian Vega S
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2001
Where WWW
Authors Stéphane Bressan, Vinsensius Berlian Vega SN
Comments (0)