Microarray data often contains multiple missing genetic expression values that degrade the performance of statistical and machine learning algorithms. This paper presents a K ranked diagonal covariance-based missing value estimation algorithm (KRCOV) that has demonstrated significantly superior performance compared to the more commonly used K-nearest neighbour (KNN) imputation algorithm when it is applied to estimate missing values of BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer. Experimental results confirm KRCOV outperformed both KNN and zero imputation techniques in terms of their classification accuracies when used to impute randomly missing values from 1% to 5%. The classifier used for this purpose was the Generalized Regression Neural Network. The paper also provides a hypothesis for why KRCOV performs better than KNN not only for bioinformatics data but also for other data types having strong correlated values.
Muhammad Shoaib B. Sehgal, Iqbal Gondal, Laurence