Sciweavers

ICCS
2009
Springer

Frequent Itemset Mining for Clustering Near Duplicate Web Documents

14 years 7 months ago
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Dmitry I. Ignatov, Sergei O. Kuznetsov
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where ICCS
Authors Dmitry I. Ignatov, Sergei O. Kuznetsov
Comments (0)