Sciweavers

DEXAW
2008
IEEE

Proximity Estimation and Hardness of Short-Text Corpora

14 years 5 months ago
Proximity Estimation and Hardness of Short-Text Corpora
Abstract—In this work, we investigate the relative hardness of shorttext corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precision level exhibited by similarity measures, according to the results obtained with different cluster validity measures on the “ideal” clustering of each corpus. Moreover, we also propose a new validity measure, named contiguity error that allowed us to observe this connection in a consistent way in all the collections considered.
Marcelo Luis Errecalde, Diego Ingaramo, Paolo Ross
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Where DEXAW
Authors Marcelo Luis Errecalde, Diego Ingaramo, Paolo Rosso
Comments (0)