On the Role of Local Matching for Efficient Semi-supervised Protein Sequence Classification

15 years 8 months ago

Download www.cs.rutgers.edu

Recent studies in protein sequence analysis have leveraged the power of unlabeled data. For example, the profile and mismatch neighborhood kernels have shown significant improvements over classifiers estimated under the fully supervised setting. In this study, we present a principled and biologically motivated framework that more effectively exploits the unlabeled data by only utilizing regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias kernel estimations that rely on unlabeled data, we also propose a method to remove this bias and improve performance of resulting classifiers. Combined with a computationally efficient sparse family of string kernels, our proposed framework achieves state-ofthe-art accuracy in semi-supervised protein remote homology detection on three large unlabeled databases.

Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic

Real-time Traffic

BIBM 2008 | Bioinformatics | Large Unlabeled Databases | Mismatch Neighborhood Kernels | Unlabeled Data |

claim paper

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	BIBM
Authors	Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic

Comments (0)

Sciweavers

On the Role of Local Matching for Efficient Semi-supervised Protein Sequence Classification

BIBM 2008 | Bioinformatics | Large Unlabeled Databases | Mismatch Neighborhood Kernels | Unlabeled Data |

Explore & Download

Productivity Tools

Sciweavers