Sciweavers

KDD
2007
ACM
151views Data Mining» more  KDD 2007»
14 years 9 months ago
Efficient mining of iterative patterns for software specification discovery
Studies have shown that program comprehension takes up to 45% of software development costs. Such high costs are caused by the lack-of documented specification and further aggrava...
Chao Liu 0001, David Lo, Siau-Cheng Khoo
KDD
2007
ACM
181views Data Mining» more  KDD 2007»
14 years 9 months ago
BoostCluster: boosting clustering by pairwise constraints
Data clustering is an important task in many disciplines. A large number of studies have attempted to improve clustering by using the side information that is often encoded as pai...
Yi Liu, Rong Jin, Anil K. Jain
KDD
2007
ACM
191views Data Mining» more  KDD 2007»
14 years 9 months ago
Cost-effective outbreak detection in networks
Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingl...
Andreas Krause, Carlos Guestrin, Christos Faloutso...
KDD
2007
ACM
182views Data Mining» more  KDD 2007»
14 years 9 months ago
A fast algorithm for finding frequent episodes in event streams
Frequent episode discovery is a popular framework for mining data available as a long sequence of events. An episode is essentially a short ordered sequence of event types and the...
Srivatsan Laxman, P. S. Sastry, K. P. Unnikrishnan
KDD
2007
ACM
139views Data Mining» more  KDD 2007»
14 years 9 months ago
Raising the baseline for high-precision text classifiers
Many important application areas of text classifiers demand high precision and it is common to compare prospective solutions to the performance of Naive Bayes. This baseline is us...
Aleksander Kolcz, Wen-tau Yih
KDD
2007
ACM
159views Data Mining» more  KDD 2007»
14 years 9 months ago
Practical guide to controlled experiments on the web: listen to your customers not to the hippo
The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single-factor or factorial designs), A/B ...
Ron Kohavi, Randal M. Henne, Dan Sommerfield
KDD
2007
ACM
184views Data Mining» more  KDD 2007»
14 years 9 months ago
Correlation search in graph databases
Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation ...
Yiping Ke, James Cheng, Wilfred Ng
KDD
2007
ACM
148views Data Mining» more  KDD 2007»
14 years 9 months ago
Detecting research topics via the correlation between graphs and texts
In this paper we address the problem of detecting topics in large-scale linked document collections. Recently, topic detection has become a very active area of research due to its...
Yookyung Jo, Carl Lagoze, C. Lee Giles
KDD
2007
ACM
184views Data Mining» more  KDD 2007»
14 years 9 months ago
Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis
To unravel the concept structure and dynamics of the bioinformatics field, we analyze a set of 7401 publications from the Web of Science and MEDLINE databases, publication years 1...
Bart De Moor, Frizo A. L. Janssens, Wolfgang Gl&au...
KDD
2007
ACM
182views Data Mining» more  KDD 2007»
14 years 9 months ago
Cleaning disguised missing data: a heuristic approach
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei