Sciweavers

ICDM
2005
IEEE

Finding Representative Set from Massive Data

14 years 5 months ago
Finding Representative Set from Massive Data
In the information age, data is pervasive. In some applications, data explosion is a significant phenomenon. The massive data volume poses challenges to both human users and computers. In this project, we propose a new model for identifying representative set from a large database. A representative set is a special subset of the original dataset, which has three main characteristics: It is significantly smaller in size compared to the original dataset. It captures the most information from the original dataset compared to other subsets of the same size. It has low redundancy among the representatives it contains. We use informationtheoretic measures such as mutual information and relative entropy to measure the representativeness of the representative set. We first design a greedy algorithm and then present a heuristic algorithm that delivers much better performance. We run experiments on two real datasets and evaluate the effectiveness of our representative set in terms of coverag...
Feng Pan, Wei Wang 0010, Anthony K. H. Tung, Jiong
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICDM
Authors Feng Pan, Wei Wang 0010, Anthony K. H. Tung, Jiong Yang
Comments (0)