Sciweavers

COMAD
2008

Discovering Interesting Subsets Using Statistical Analysis

14 years 1 months ago
Discovering Interesting Subsets Using Statistical Analysis
In this paper we present algorithms for identifying interesting subsets of a given database of records. In many real life applications, it is important to automatically discover subsets of records which are interesting with respect to a given measure. For example, in the customer support database, it is important to identify subsets of tickets having service time which is too large (or too small) when compared with the service time of the rest of the tickets. We use Student's t-test to check whether the measure values for a subset and its complement differ significantly. We first discuss the brute-force approach and then present heuristic-based state-space search algorithm to discover interesting subsets of the given database. To use the proposed heuristic-based approach on large data sets, we then present a samplingbased algorithm that uses sampling together with the proposed heuristics to efficiently identify interesting sets in large data sets. We discuss an application of the...
Maitreya Natu, Girish Palshikar
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where COMAD
Authors Maitreya Natu, Girish Palshikar
Comments (0)