Sciweavers

DMIN
2007
76views Data Mining» more  DMIN 2007»
14 years 27 days ago
Towards Average Case Analysis of Itemset Mining
—We perform a statistical analysis and describe the asymptotic behavior of the frequency and size distribution of δoccurrent, minimal δ-occurrent, and maximal δ-occurrent item...
Dan Singer, David J. Haglin, Anna M. Manning
DMIN
2007
214views Data Mining» more  DMIN 2007»
14 years 27 days ago
A Fast KNN Algorithm Based on Simulated Annealing
K-Nearest Neighbor is used broadly in text classification, but it has one deficiency—computational efficiency. In this paper, we propose a heuristic search way to find out the k ...
Chuanyao Yang, Yuqin Li, Chenghong Zhang, Yunfa Hu
DMIN
2007
186views Data Mining» more  DMIN 2007»
14 years 27 days ago
Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?
- The classifier built from a data set with a highly skewed class distribution generally predicts the more frequently occurring classes much more often than the infrequently occurr...
Gary M. Weiss, Kate McCarthy, Bibi Zabar
DMIN
2007
226views Data Mining» more  DMIN 2007»
14 years 27 days ago
Generative Oversampling for Mining Imbalanced Datasets
— One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, d...
Alexander Liu, Joydeep Ghosh, Cheryl Martin
DMIN
2007
183views Data Mining» more  DMIN 2007»
14 years 27 days ago
Crawling Attacks Against Web-based Recommender Systems
—User profiles derived from Web navigation data are used in important e-commerce applications such as Web personalization, recommender systems, and Web analytics. In the open en...
Runa Bhaumik, Robin D. Burke, Bamshad Mobasher
DMIN
2007
90views Data Mining» more  DMIN 2007»
14 years 27 days ago
On Minimal Infrequent Itemset Mining
—A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets include statistical disclosure risk assessment, bioinf...
David J. Haglin, Anna M. Manning
DMIN
2007
161views Data Mining» more  DMIN 2007»
14 years 27 days ago
Efficient Summarization Based On Categorized Keywords
—The information that exists on the World Wide Web is enormous enough in order to distract the users when trying to find useful information. In order to overcome the large amount...
Christos Bouras, Vassilis Poulopoulos, Vassilis Ts...
DMIN
2007
85views Data Mining» more  DMIN 2007»
14 years 27 days ago
A Clustering Approach for Achieving Data Privacy
Abstract — New privacy regulations together with everincreasing data availability and computational power have created a huge interest in data privacy research. One major researc...
Alina Campan, Traian Marius Truta, John Miller, Ra...
DMIN
2007
158views Data Mining» more  DMIN 2007»
14 years 27 days ago
Mining Frequent Itemsets Using Re-Usable Data Structure
- Several algorithms have been introduced for mining frequent itemsets. The recent datasettransformation approach suffers either from the possible increasing in the number of struc...
Mohamed Yakout, Alaaeldin M. Hafez, Hussein Aly
DMIN
2007
91views Data Mining» more  DMIN 2007»
14 years 27 days ago
Instance Ranking using Ensemble Spread
- This paper investigates a technique for predicting ensemble uncertainty originally proposed in the weather forecasting domain. The overall purpose is to find out if the technique...
Rikard König, Ulf Johansson, Lars Niklasson