Sciweavers

IIS
2003

Web Search Results Clustering in Polish: Experimental Evaluation of Carrot

14 years 1 months ago
Web Search Results Clustering in Polish: Experimental Evaluation of Carrot
Abstract. In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system named Carrot. The algorithm we put into consideration – Suffix Tree Clustering has been acknowledged as being very efficient when applied to English. We present conclusions from its experimental application to Polish, indicating fragile areas, where the algorithm seem to fail due to specific properties of the input data. We indicate that the characteristics of produced clusters (number, value), unlike in English, strongly depend on pre-processing phase. We also attempt to investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results. Finally, we introduce two approaches to efficient, approximate stemming of Polish words: quasi-stemmer and an automaton-based method. 1 Search results clustering overview Together with an exponentia...
Dawid Weiss, Jerzy Stefanowski
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where IIS
Authors Dawid Weiss, Jerzy Stefanowski
Comments (0)