Sciweavers

IR
2006

Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments

14 years 24 days ago
Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments
Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clustering was applied to a sample (N = 5,000) that was reduced with principal components analysis. The dendrograms were heuristically cut to find an optimal partition, whose clusters were compared with each of the 30 queries to retrieve the best-matching cluster. The fourlevel relevance assessment was collapsed into a binary one by (A) considering all the relevant and (B) only the highly relevant documents relevant, respectively. Single linkage (SL) was the worst method. It created many tiny clusters, and, consequently, searches enabled with it had high precision and low recall. The complete linkage (CL), average linkage (AL), and Ward's methods (WM) returned reasonably-sized clusters typically of 18-32 documents. Their recall (A: 27-52%, B: 50-82%) and precision (A: 83-90%, B: 18-21%) was higher than and compara...
Tuomo Korenius, Jorma Laurikkala, Martti Juhola, K
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2006
Where IR
Authors Tuomo Korenius, Jorma Laurikkala, Martti Juhola, Kalervo Järvelin
Comments (0)