Assessing data mining results via swap randomization

14 years 12 months ago

Download www.cs.helsinki.fi

The problem of assessing the significance of data mining results on high-dimensional 0?1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chi-square tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, ...

Aristides Gionis, Heikki Mannila, Panayiotis Tsapa

Real-time Traffic

Data Mining | Data Mining Algorithms | KDD 2006 | Simple Randomization Technique | Swap Randomization Method |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2006
Where	KDD
Authors	Aristides Gionis, Heikki Mannila, Panayiotis Tsaparas, Taneli Mielikäinen

Comments (0)

Sciweavers

Assessing data mining results via swap randomization

Data Mining | Data Mining Algorithms | KDD 2006 | Simple Randomization Technique | Swap Randomization Method |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers