Sciweavers

KDD
2004
ACM
113views Data Mining» more  KDD 2004»
14 years 9 months ago
Learning spatially variant dissimilarity (SVaD) measures
Clustering algorithms typically operate on a feature vector representation of the data and find clusters that are compact with respect to an assumed (dis)similarity measure betwee...
Krishna Kummamuru, Raghu Krishnapuram, Rakesh Agra...
KDD
2004
ACM
330views Data Mining» more  KDD 2004»
14 years 9 months ago
Learning to detect malicious executables in the wild
In this paper, we describe the development of a fielded application for detecting malicious executables in the wild. We gathered 1971 benign and 1651 malicious executables and enc...
Jeremy Z. Kolter, Marcus A. Maloof
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 9 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
KDD
2004
ACM
211views Data Mining» more  KDD 2004»
14 years 9 months ago
Towards parameter-free data mining
Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings ...
Eamonn J. Keogh, Stefano Lonardi, Chotirat (Ann) R...
KDD
2004
ACM
137views Data Mining» more  KDD 2004»
14 years 9 months ago
When do data mining results violate privacy?
Privacy-preserving data mining has concentrated on obtaining valid results when the input data is private. An extreme example is Secure Multiparty Computation-based methods, where...
Murat Kantarcioglu, Jiashun Jin, Chris Clifton
KDD
2004
ACM
210views Data Mining» more  KDD 2004»
14 years 9 months ago
Web usage mining based on probabilistic latent semantic analysis
The primary goal of Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of user sessions and discoveri...
Xin Jin, Yanzan Zhou, Bamshad Mobasher
KDD
2004
ACM
145views Data Mining» more  KDD 2004»
14 years 9 months ago
Mining coherent gene clusters from gene-sample-time microarray data
Extensive studies have shown that mining microarray data sets is important in bioinformatics research and biomedical applications. In this paper, we explore a novel type of genesa...
Daxin Jiang, Jian Pei, Murali Ramanathan, Chun Tan...
KDD
2004
ACM
170views Data Mining» more  KDD 2004»
14 years 9 months ago
Why collective inference improves relational classification
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could b...
David Jensen, Jennifer Neville, Brian Gallagher
KDD
2004
ACM
114views Data Mining» more  KDD 2004»
14 years 9 months ago
Mining the space of graph properties
Existing data mining algorithms on graphs look for nodes satisfying specific properties, such as specific notions of structural similarity or specific measures of link-based impor...
Glen Jeh, Jennifer Widom
KDD
2004
ACM
148views Data Mining» more  KDD 2004»
14 years 9 months ago
Interestingness of frequent itemsets using Bayesian networks as background knowledge
The paper presents a method for pruning frequent itemsets based on background knowledge represented by a Bayesian network. The interestingness of an itemset is defined as the abso...
Szymon Jaroszewicz, Dan A. Simovici