A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

15 years 6 months ago

Download www.biomedcentral.com

Background: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. Methods: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the a...

Zizhen Yao, Walter L. Ruzzo

Real-time Traffic

BMCBI 2006 | Heterogeneous Data | KNN Methods | Prediction |

claim paper

Added	10 Dec 2010
Updated	10 Dec 2010
Type	Journal
Year	2006
Where	BMCBI
Authors	Zizhen Yao, Walter L. Ruzzo

Sciweavers

A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

BMCBI 2006 | Heterogeneous Data | KNN Methods | Prediction |

Explore & Download

Productivity Tools

Sciweavers