Sciweavers

1224 search results - page 171 / 245
» Categories of Containers
Sort
View
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 9 months ago
Structured entity identification and document categorization: two tasks with one joint model
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Indrajit Bhattacharya, Shantanu Godbole, Sachindra...
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 9 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
KDD
2008
ACM
206views Data Mining» more  KDD 2008»
14 years 9 months ago
Identifying biologically relevant genes via multiple heterogeneous data sources
Selection of genes that are differentially expressed and critical to a particular biological process has been a major challenge in post-array analysis. Recent development in bioin...
Zheng Zhao, Jiangxin Wang, Huan Liu, Jieping Ye, Y...
KDD
2008
ACM
163views Data Mining» more  KDD 2008»
14 years 9 months ago
The cost of privacy: destruction of data-mining utility in anonymized data publishing
Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "qu...
Justin Brickell, Vitaly Shmatikov
KDD
2008
ACM
119views Data Mining» more  KDD 2008»
14 years 9 months ago
SAIL: summation-based incremental learning for information-theoretic clustering
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...
Junjie Wu, Hui Xiong, Jian Chen