Towards Index-based Similarity Search for Protein Structure Databases

15 years 11 months ago

Download conferences.computer.org

We propose two methods for ﬁnding similarities in protein structure databases. Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. These feature vectors are then indexed using a multidimensional index structure. Our ﬁrst technique considers the problem of ﬁnding proteins similar to a given query protein in a protein dataset. This technique quickly ﬁnds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to ﬁnd an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while keeping the sensitivity similar.

Orhan Çamoglu, Tamer Kahveci, Ambuj K. Sing

Real-time Traffic