Sciweavers

DMIN
2008
176views Data Mining» more  DMIN 2008»
13 years 10 months ago
Multi-Class SVM for Large Data Sets Considering Models of Classes Distribution
Support Vector Machines (SVM) have gained profound interest amidst the researchers. One of the important issues concerning SVM is with its application to large data sets. It is rec...
Jair Cervantes, Xiaoou Li, Wen Yu
DMIN
2008
147views Data Mining» more  DMIN 2008»
13 years 10 months ago
Approximate Computation of Object Distances by Locality-Sensitive Hashing
We propose an approximate computation technique for inter-object distances for binary data sets. Our approach is based on the locality sensitive hashing, scales up with the number ...
Selim Mimaroglu, Dan A. Simovici
DMIN
2008
190views Data Mining» more  DMIN 2008»
13 years 10 months ago
Optimization of Self-Organizing Maps Ensemble in Prediction
The knowledge discovery process encounters the difficulties to analyze large amount of data. Indeed, some theoretical problems related to high dimensional spaces then appear and de...
Elie Prudhomme, Stéphane Lallich
DMIN
2008
152views Data Mining» more  DMIN 2008»
13 years 10 months ago
PCS: An Efficient Clustering Method for High-Dimensional Data
Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for a large set of highdimensional data suitable for database ap...
Wei Li 0011, Cindy Chen, Jie Wang
DMIN
2007
110views Data Mining» more  DMIN 2007»
13 years 10 months ago
Mining for Structural Anomalies in Graph-based Data
—In this paper we present graph-based approaches to mining for anomalies in domains where the anomalies consist of unexpected entity/relationship alterations that closely resembl...
William Eberle, Lawrence B. Holder
DMIN
2007
203views Data Mining» more  DMIN 2007»
13 years 10 months ago
Evaluation of Feature Selection Techniques for Analysis of Functional MRI and EEG
— The application of feature selection techniques greatly reduces the computational cost of classifying highdimensional data. Feature selection algorithms of varying performance ...
Lauren Burrell, Otis Smart, George J. Georgoulas, ...
DMIN
2007
76views Data Mining» more  DMIN 2007»
13 years 10 months ago
Towards Average Case Analysis of Itemset Mining
—We perform a statistical analysis and describe the asymptotic behavior of the frequency and size distribution of δoccurrent, minimal δ-occurrent, and maximal δ-occurrent item...
Dan Singer, David J. Haglin, Anna M. Manning
DMIN
2007
214views Data Mining» more  DMIN 2007»
13 years 10 months ago
A Fast KNN Algorithm Based on Simulated Annealing
K-Nearest Neighbor is used broadly in text classification, but it has one deficiency—computational efficiency. In this paper, we propose a heuristic search way to find out the k ...
Chuanyao Yang, Yuqin Li, Chenghong Zhang, Yunfa Hu
DMIN
2007
186views Data Mining» more  DMIN 2007»
13 years 10 months ago
Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?
- The classifier built from a data set with a highly skewed class distribution generally predicts the more frequently occurring classes much more often than the infrequently occurr...
Gary M. Weiss, Kate McCarthy, Bibi Zabar