Sciweavers

KDD
2006
ACM
381views Data Mining» more  KDD 2006»
14 years 12 months ago
GPLAG: detection of software plagiarism by program dependence graph analysis
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source pr...
Chao Liu 0001, Chen Chen, Jiawei Han, Philip S. Yu
KDD
2006
ACM
118views Data Mining» more  KDD 2006»
14 years 12 months ago
Maximum profit mining and its application in software development
While most software defects (i.e., bugs) are corrected and tested as part of the lengthy software development cycle, enterprise software vendors often have to release software pro...
Charles X. Ling, Victor S. Sheng, Tilmann F. W. Br...
KDD
2006
ACM
136views Data Mining» more  KDD 2006»
14 years 12 months ago
Very sparse random projections
There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A Rn...
Ping Li, Trevor Hastie, Kenneth Ward Church
KDD
2006
ACM
164views Data Mining» more  KDD 2006»
14 years 12 months ago
Sampling from large graphs
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.)...
Jure Leskovec, Christos Faloutsos
KDD
2006
ACM
128views Data Mining» more  KDD 2006»
14 years 12 months ago
Workload-aware anonymization
Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality...
Kristen LeFevre, David J. DeWitt, Raghu Ramakrishn...
KDD
2006
ACM
129views Data Mining» more  KDD 2006»
14 years 12 months ago
Bias and controversy: beyond the statistical deviation
In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a `data...
Hady Wirawan Lauw, Ee-Peng Lim, Ke Wang
KDD
2006
ACM
181views Data Mining» more  KDD 2006»
14 years 12 months ago
Cryptographically private support vector machines
We study the problem of private classification using kernel methods. More specifically, we propose private protocols implementing the Kernel Adatron and Kernel Perceptron learning ...
Helger Lipmaa, Sven Laur, Taneli Mielikäinen
KDD
2006
ACM
163views Data Mining» more  KDD 2006»
14 years 12 months ago
New EM derived from Kullback-Leibler divergence
We introduce a new EM framework in which it is possible not only to optimize the model parameters but also the number of model components. A key feature of our approach is that we...
Longin Jan Latecki, Marc Sobel, Rolf Lakämper
KDD
2006
ACM
114views Data Mining» more  KDD 2006»
14 years 12 months ago
Algorithms for storytelling
Deept Kumar, Naren Ramakrishnan, Richard F. Helm, ...
KDD
2006
ACM
120views Data Mining» more  KDD 2006»
14 years 12 months ago
Hierarchical topic segmentation of websites
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...
Ravi Kumar, Kunal Punera, Andrew Tomkins