Sciweavers

328 search results - page 18 / 66
» A Multi-level Approach for Document Clustering
Sort
View
ICDM
2006
IEEE
132views Data Mining» more  ICDM 2006»
14 years 1 months ago
High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to...
Hassan H. Malik, John R. Kender
SAC
2009
ACM
14 years 2 months ago
Combining statistics and semantics via ensemble model for document clustering
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
SIGIR
2009
ACM
14 years 2 months ago
A latent topic model for linked documents
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, i...
Zhen Guo, Shenghuo Zhu, Yun Chi, Zhongfei Zhang, Y...
ICDAR
2009
IEEE
14 years 2 months ago
Enhanced Text Extraction from Arabic Degraded Document Images Using EM Algorithm
This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a...
Wafa Boussellaa, Aymen Bougacha, Abderrazak Zahour...
ACL
2008
13 years 9 months ago
Learning Document-Level Semantic Properties from Free-Text Annotations
This paper demonstrates a new method for leveraging unstructured annotations to infer semantic document properties. We consider the domain of product reviews, which are often anno...
S. R. K. Branavan, Harr Chen, Jacob Eisenstein, Re...