Proximity Estimation and Hardness of Short-Text Corpora

16 years 2 months ago

Download users.dsic.upv.es

Abstract—In this work, we investigate the relative hardness of shorttext corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precision level exhibited by similarity measures, according to the results obtained with different cluster validity measures on the “ideal” clustering of each corpus. Moreover, we also propose a new validity measure, named contiguity error that allowed us to observe this connection in a consistent way in all the collections considered.

Marcelo Luis Errecalde, Diego Ingaramo, Paolo Ross

Real-time Traffic

Database | DEXAW 2008 | Similarity Measures | Traditional Similarity Measures | Validity Measure |

claim paper

Post Info
More Details (n/a)

Added	29 May 2010
Updated	29 May 2010
Type	Conference
Year	2008
Where	DEXAW
Authors	Marcelo Luis Errecalde, Diego Ingaramo, Paolo Rosso

Comments (0)

Sciweavers

Proximity Estimation and Hardness of Short-Text Corpora

Database | DEXAW 2008 | Similarity Measures | Traditional Similarity Measures | Validity Measure |

Explore & Download

Productivity Tools

Sciweavers