Sciweavers

PAKDD
2005
ACM
132views Data Mining» more  PAKDD 2005»
14 years 2 months ago
On Multiple Query Optimization in Data Mining
Traditional multiple query optimization methods focus on identifying common subexpressions in sets of relational queries and on constructing their global execution plans. In this p...
Marek Wojciechowski, Maciej Zakrzewicz
PAKDD
2005
ACM
133views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Feature Selection for High Dimensional Face Image Using Self-organizing Maps
: While feature selection is very difficult for high dimensional, unstructured data such as face image, it may be much easier to do if the data can be faithfully transformed into l...
Xiaoyang Tan, Songcan Chen, Zhi-Hua Zhou, Fuyan Zh...
PAKDD
2005
ACM
102views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules
Abstract. We apply a machine learning method to the occupation coding, which is a task to categorize the answers to open-ended questions regarding the respondent’s occupation. Sp...
Kazuko Takahashi, Hiroya Takamura, Manabu Okumura
PAKDD
2005
ACM
134views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Improved Bayesian Spam Filtering Based on Co-weighted Multi-area Information
Abstract. Bayesian spam filters, in general, compute probability estimations for tokens either without considering the email areas of occurrences except the body or treating the s...
Raju Shrestha, Yaping Lin
PAKDD
2005
ACM
120views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics
In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets ...
Mirco Nanni
PAKDD
2005
ACM
114views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Increasing Classification Accuracy by Combining Adaptive Sampling and Convex Pseudo-Data
The availability of microarray data has enabled several studies on the application of aggregated classifiers for molecular classification. We present a combination of classifier ag...
Chia Huey Ooi, Madhu Chetty
PAKDD
2005
ACM
184views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis
Mixture models, such as Gaussian Mixture Model, have been widely used in many applications for modeling data. Gaussian mixture model (GMM) assumes that data points are generated fr...
Luo Si, Rong Jin
PAKDD
2005
ACM
180views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Conditional Random Fields for Transmembrane Helix Prediction
Abstract. It is estimated that 20% of genes in the human genome encode for integral membrane proteins (IMPs) and some estimates are much higher. IMPs control a broad range of event...
Lior Lukov, Sanjay Chawla, W. Bret Church
PAKDD
2005
ACM
164views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Covariance and PCA for Categorical Variables
Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covaria...
Hirotaka Niitsuma, Takashi Okada
PAKDD
2005
ACM
161views Data Mining» more  PAKDD 2005»
14 years 2 months ago
Online Algorithms for Mining Inter-stream Associations from Large Sensor Networks
We study the problem of mining frequent value sets from a large sensor network. We discuss how sensor stream data could be represented that facilitates efficient online mining and ...
K. K. Loo, Ivy Tong, Ben Kao