Sciweavers

LREC
2010

Text Cluster Trimming for Better Descriptions and Improved Quality

14 years 28 days ago
Text Cluster Trimming for Better Descriptions and Improved Quality
Text clustering is potentially very useful for exploration of text sets that are too large to study manually. The success of such a tool depends on whether the results can be explained to the user. An automatically extracted cluster description usually consists of a few words that are deemed representative for the cluster. It is preferably short in order to be easily grasped. However, text cluster content is often diverse. We introduce a trimming method that removes texts that do not contain any, or a few of the words in the cluster description. The result is clusters that match their descriptions better. In experiments on two quite different text sets we obtain significant improvements in both internal and external clustering quality for the trimmed clustering compared to the original. The trimming thus has two positive effects: it forces the clusters to agree with their descriptions (resulting in better descriptions) and improves the quality of the trimmed clusters.
Magnus Rosell
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Magnus Rosell
Comments (0)