Sciweavers

DEXAW
2009
IEEE

Clustering of Short Strings in Large Databases

14 years 7 months ago
Clustering of Short Strings in Large Databases
—A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.
Michail Kazimianec, Arturas Mazeika
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where DEXAW
Authors Michail Kazimianec, Arturas Mazeika
Comments (0)