Discovering Interesting Subsets Using Statistical Analysis

14 years 4 months ago

Download www.cse.iitb.ac.in

In this paper we present algorithms for identifying interesting subsets of a given database of records. In many real life applications, it is important to automatically discover subsets of records which are interesting with respect to a given measure. For example, in the customer support database, it is important to identify subsets of tickets having service time which is too large (or too small) when compared with the service time of the rest of the tickets. We use Student's t-test to check whether the measure values for a subset and its complement differ significantly. We first discuss the brute-force approach and then present heuristic-based state-space search algorithm to discover interesting subsets of the given database. To use the proposed heuristic-based approach on large data sets, we then present a samplingbased algorithm that uses sampling together with the proposed heuristics to efficiently identify interesting sets in large data sets. We discuss an application of the...

Maitreya Natu, Girish Palshikar

Real-time Traffic

COMAD 2008 | Interesting Subsets | Knowledge Management | Large Data Sets | Service Times |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	COMAD
Authors	Maitreya Natu, Girish Palshikar

Comments (0)

Sciweavers

Discovering Interesting Subsets Using Statistical Analysis

COMAD 2008 | Interesting Subsets | Knowledge Management | Large Data Sets | Service Times |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers