Sciweavers

KAIS
2007

The pairwise attribute noise detection algorithm

14 years 10 days ago
The pairwise attribute noise detection algorithm
Analyzing the quality of data prior to constructing data mining models is emerging as an important issue. Algorithms for identifying noise in a given data set can provide a good measure of data quality. Considerable attention has been devoted to detecting class noise or labeling errors. In contrast, limited research work has been devoted to detecting instances with attribute noise, in part due to the difficulty of the problem. We present a novel approach for detecting instances with attribute noise and demonstrate its usefulness with case studies using two different real-world software measurement data sets. Our approach, called Pairwise Attribute Noise Detection Algorithm (PANDA), is compared with a nearest neighbor, distance-based outlier detection technique (denoted DM) investigated in related literature. Since what constitutes noise is domain specific, our case studies uses a software engineering expert to inspect the instances identified by the two approaches to determine wheth...
Jason Van Hulse, Taghi M. Khoshgoftaar, Haiying Hu
Added 16 Dec 2010
Updated 16 Dec 2010
Type Journal
Year 2007
Where KAIS
Authors Jason Van Hulse, Taghi M. Khoshgoftaar, Haiying Huang
Comments (0)