Sciweavers

ICDM
2007
IEEE

GDClust: A Graph-Based Document Clustering Technique

14 years 5 months ago
GDClust: A Graph-Based Document Clustering Technique
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (Graph-Based Document Clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
M. Shahriar Hossain, Rafal A. Angryk
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where ICDM
Authors M. Shahriar Hossain, Rafal A. Angryk
Comments (0)