Sciweavers

SGAI
2007
Springer

Metrics for Mining Multisets

14 years 7 months ago
Metrics for Mining Multisets
Abstract. We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining applications. One particular instance of this class originated from the necessity for a clustering of criminal behaviours. These distance measures are parameterized by a function f which, given a few simple restrictions, will always produce a valid metric. This flexibility allows these measures to be tailored for many domain-specific applications. In this paper, the metrics are applied in bio-informatics (genomics), criminal behaviour clustering and text mining. The metric we propose also is a generalization of some known measures, e.g., the Jaccard distance and the Canberra distance. We discuss several options, and compare the behaviour of different instances.
Walter A. Kosters, Jeroen F. J. Laros
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where SGAI
Authors Walter A. Kosters, Jeroen F. J. Laros
Comments (0)