GDClust: A Graph-Based Document Clustering Technique

16 years 1 months ago

Download www.cs.montana.edu

This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (Graph-Based Document Clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.

M. Shahriar Hossain, Rafal A. Angryk

Real-time Traffic

Data Mining | Document Clustering | Frequent Senses | Frequent Subgraphs | ICDM 2007 |

claim paper

Post Info
More Details (n/a)

Added	03 Jun 2010
Updated	03 Jun 2010
Type	Conference
Year	2007
Where	ICDM
Authors	M. Shahriar Hossain, Rafal A. Angryk

Comments (0)

Sciweavers

GDClust: A Graph-Based Document Clustering Technique

Data Mining | Document Clustering | Frequent Senses | Frequent Subgraphs | ICDM 2007 |

Explore & Download

Productivity Tools

Sciweavers