Abstract. Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have ...
Uncertainty in categorical data is commonplace in many applications, including data cleaning, database integration, and biological annotation. In such domains, the correct value o...
Sarvjeet Singh, Chris Mayfield, Sunil Prabhakar, R...
Sampling has been recognized as an important technique to improve the efficiency of clustering. However, with sampling applied, those points which are not sampled will not have t...
Since clustering is unsupervised and highly explorative, clustering validation (i.e. assessing the quality of clustering solutions) has been an important and long standing researc...
Today, database relations are widely used and distributed over the Internet. Since these data can be easily tampered with, it is critical to ensure the integrity of these data. In...