Sciweavers

ICDM
2010
IEEE

Assessing Data Mining Results on Matrices with Randomization

13 years 9 months ago
Assessing Data Mining Results on Matrices with Randomization
Abstract--Randomization is a general technique for evaluating the significance of data analysis results. In randomizationbased significance testing, a result is considered to be interesting if it is unlikely to obtain as good result on random data sharing some basic properties with the original data. Recently, the randomization approach has been applied to assess data mining results on binary matrices and limited types of realvalued matrices. In these works, the row and column value distributions are approximately preserved in randomization. However, the previous approaches suffer from various technical and practical shortcomings. In this paper, we give solutions to these problems and introduce a new practical algorithm for randomizing various types of matrices while preserving the row and column value distributions more accurately. We propose a new approach for randomizing matrices containing features measured in different scales. Compared to previous work, our approach can be applied...
Markus Ojala
Added 12 Feb 2011
Updated 12 Feb 2011
Type Journal
Year 2010
Where ICDM
Authors Markus Ojala
Comments (0)