Sciweavers

PAKDD
2000
ACM

Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases

14 years 3 months ago
Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases
The objective of data reduction is to obtain a compact representation of a large data set to facilitate repeated use of non-redundant information with complex and slow learning algorithms and to allow efficient data transfer and storage. For a user-controllable allowed accuracy loss we propose an effective data reduction procedure based on guided sampling for identifying a minimal size representative subset, followed by a model-sensitivity analysis for determining an appropriate compression level for each attribute. Experiments were performed on 3 large data sets and, depending on an allowed accuracy loss margin ranging from 1% to 5% of the ideal generalization, the achieved compression rates ranged between 95 and 12,500 times. These results indicate that transferring reduced data sets from multiple locations to a centralized site for an efficient and accurate knowledge discovery might often be possible in practice.
Slobodan Vucetic, Zoran Obradovic
Added 25 Aug 2010
Updated 25 Aug 2010
Type Conference
Year 2000
Where PAKDD
Authors Slobodan Vucetic, Zoran Obradovic
Comments (0)