Sciweavers

328 search results - page 16 / 66
» A Multi-level Approach for Document Clustering
Sort
View
ICDM
2007
IEEE
143views Data Mining» more  ICDM 2007»
14 years 1 months ago
Bit Sequences and Biclustering of Text Documents
We propose a new technique for clustering of text documents that relies on a biclustering structure constructed on terms and documents. Our approach makes use of a greedy algorith...
Selim Mimaroglu, Kuniaki Uehara
SIGIR
2006
ACM
14 years 1 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan
WIDM
2003
ACM
14 years 24 days ago
Clustering documents in a web directory
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hie...
Giordano Adami, Paolo Avesani, Diego Sona
ICDAR
2011
IEEE
12 years 7 months ago
Word Retrieval in Historical Document Using Character-Primitives
Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/agei...
Partha Pratim Roy, Jean-Yves Ramel, Nicolas Ragot
ICCS
2009
Springer
14 years 2 months ago
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
Dmitry I. Ignatov, Sergei O. Kuznetsov