Sciweavers

96
Voted
KDD
2001
ACM
163views Data Mining» more  KDD 2001»
16 years 2 months ago
The "DGX" distribution for mining massive, skewed data
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Zhiqiang Bi, Christos Faloutsos, Flip Korn
109
Voted
KDD
2001
ACM
113views Data Mining» more  KDD 2001»
16 years 2 months ago
Mining massively incomplete data sets by conceptual reconstruction
Charu C. Aggarwal, Srinivasan Parthasarathy
104
Voted
KDD
2001
ACM
155views Data Mining» more  KDD 2001»
16 years 2 months ago
Evaluating the novelty of text-mined rules using lexical knowledge
Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupu...
142
Voted
KDD
2001
ACM
187views Data Mining» more  KDD 2001»
16 years 2 months ago
Random projection in dimensionality reduction: applications to image and text data
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however,...
Ella Bingham, Heikki Mannila
KDD
2001
ACM
156views Data Mining» more  KDD 2001»
16 years 2 months ago
Classification of genes using probabilistic models of microarray expression profiles
Paul Pavlidis, Christopher Tang, William Stafford ...
103
Voted
KDD
2001
ACM
169views Data Mining» more  KDD 2001»
16 years 2 months ago
Hierarchical cluster analysis of SAGE data for cancer profiling
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the...
Jörg Sander, Monica C. Sleumer, Raymond T. Ng
127
Voted
KDD
2001
ACM
163views Data Mining» more  KDD 2001»
16 years 2 months ago
Learning to recognize brain specific proteins based on low-level features from on-line prediction servers
During the last decade, the area of bioinformatics has produced an overwhelming amount of data, with the recently published draft of the human genome being the most prominent exam...
Henrik Boström, Joakim Cöster, Lars Aske...
114
Voted
KDD
2001
ACM
152views Data Mining» more  KDD 2001»
16 years 2 months ago
A scalable algorithm for clustering protein sequences
Valerie Guralnik, George Karypis
74
Voted
KDD
2001
ACM
145views Data Mining» more  KDD 2001»
16 years 2 months ago
A learning algorithm for string assembly
Mark K. Goldberg, Darren T. Lim, Malik Magdon-Isma...