The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, wor...
Classification in imbalanced domains is a recent challenge in machine learning. We refer to imbalanced classification when data presents many examples from one class and few from ...
This paper proposes a new class-based method to estimate the strength of association in word co-occurrence for the purpose of structural disambiguation. To deal with sparseness of...
To facilitate more meaningful interpretation considering the internal interdependency relationships between data values, a new form of high-order (multiple-valued) pattern known a...
Labeled data for classification could often be obtained by sampling that restricts or favors choice of certain classes. A classifier trained using such data will be biased, resulti...