Evaluation of Internal Validity Measures in Short-Text Corpora

15 years 8 months ago

Download users.dsic.upv.es

Short texts clustering is one of the most difficult tasks in natural language processing due to the low frequencies of the document terms. We are interested in analysing these kind of corpora in order to develop novel techniques that may be used to improve results obtained by classical clustering algorithms. In this paper we are presenting an evaluation of different internal clustering validity measures in order to determine the possible correlation between these measures and that of the F-Measure, a well-known external clustering measure used to calculate the performance of clustering algorithms. We have used several short-text corpora in the experiments carried out. The obtained correlation with a particular set of internal validity measures let us to conclude that some of them may be used to improve the performance of text clustering algorithms.

Diego Ingaramo, David Pinto, Paolo Rosso, Marcelo

Real-time Traffic

CICLING 2008 | Classical Clustering Algorithms | Clustering Algorithms | Clustering Validity Measures | Natural Language Processing |

claim paper

» Spam filtering for short messages

» Assessing Dialog System User Simulation Evaluation Measures Using Human Judges

» Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Relate...

» Definition and Validation of Design Metrics for Distributed Applications

» An Evaluation Framework for Plagiarism Detection

» Evaluating evaluation measure stability

» Unbounded Dependency Recovery for Parser Evaluation

» An overview and framework for PD backtesting and benchmarking

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	CICLING
Authors	Diego Ingaramo, David Pinto, Paolo Rosso, Marcelo Errecalde

Comments (0)

Sciweavers

Evaluation of Internal Validity Measures in Short-Text Corpora

CICLING 2008 | Classical Clustering Algorithms | Clustering Algorithms | Clustering Validity Measures | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers