An Assessment of a Metric Space Database Index to Support Sequence Homology

14 years 5 months ago

Download userweb.cs.utexas.edu

Hierarchical metric-space clustering methods have been commonly used to organize proteomes into taxonomies. Consequently, it is often anticipated that hierarchical clustering can be leveraged as a basis for scalable database index structures capable of managing the hyper-exponential growth of sequence data. M-tree is one such data structure specialized for the management of large data sets on disk. We explore the application of M-trees to the storage and retrieval of peptide sequence data. Exploiting a technique first suggested by Myers, we organize the database as records of fixed length substrings. Empirical results are promising. However, metric-space indexes are subject to “the curse of dimensionality” and the ultimate performance of an index is sensitive to the quality of the initial construction of the index. We introduce new hierarchical bulk-load algorithm that alternates between top-down and bottom-up clustering to initialize the index. Using the Yeast Proteomes, the bi-d...

Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker

Real-time Traffic

BIBE 2003 | Bioinformatics | Hierarchical | Metric-space Clustering Methods | Sequence Data |

claim paper

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	BIBE
Authors	Rui Mao, Weijia Xu, Neha Singh, Daniel P. Miranker

Comments (0)

Sciweavers

An Assessment of a Metric Space Database Index to Support Sequence Homology

BIBE 2003 | Bioinformatics | Hierarchical | Metric-space Clustering Methods | Sequence Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers