DISC: Data-Intensive Similarity Measure for Categorical Data

13 years 5 months ago

Download web2py.iiit.ac.in

Abstract. The concept of similarity is fundamentally important in almost every scientiﬁc ﬁeld. Clustering, distance-based outlier detection, classiﬁcation, regression and search are major data mining techniques which compute the similarities between instances and hence the choice of a particular similarity measure can turn out to be a major cause of success or failure of the algorithm. The notion of similarity or distance for categorical data is not as straightforward as for continuous data and hence, is a major challenge. This is due to the fact that diﬀerent values taken by a categorical attribute are not inherently ordered and hence a notion of direct comparison between two categorical values is not possible. In addition, the notion of similarity can diﬀer depending on the particular domain, dataset, or task at hand. In this paper we present a new similarity measure for categorical data DISC - Data-Intensive Similarity Measure for Categorical Data. DISC captures the semant...

Aditya Desai, Himanshu Singh, Vikram Pudi

Real-time Traffic

Data Mining | Data Mining Techniques | PAKDD 2011 | Similarity Measure | Similarity Measures |

claim paper

Post Info
More Details (n/a)

Added	16 Sep 2011
Updated	16 Sep 2011
Type	Journal
Year	2011
Where	PAKDD
Authors	Aditya Desai, Himanshu Singh, Vikram Pudi

Comments (0)

Sciweavers

DISC: Data-Intensive Similarity Measure for Categorical Data

Data Mining | Data Mining Techniques | PAKDD 2011 | Similarity Measure | Similarity Measures |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers