Assessing Data Mining Results on Matrices with Randomization

15 years 4 months ago

Download users.ics.tkk.fi

Abstract--Randomization is a general technique for evaluating the significance of data analysis results. In randomizationbased significance testing, a result is considered to be interesting if it is unlikely to obtain as good result on random data sharing some basic properties with the original data. Recently, the randomization approach has been applied to assess data mining results on binary matrices and limited types of realvalued matrices. In these works, the row and column value distributions are approximately preserved in randomization. However, the previous approaches suffer from various technical and practical shortcomings. In this paper, we give solutions to these problems and introduce a new practical algorithm for randomizing various types of matrices while preserving the row and column value distributions more accurately. We propose a new approach for randomizing matrices containing features measured in different scales. Compared to previous work, our approach can be applied...

Markus Ojala

Real-time Traffic