Existing sequence comparison software applications lack automation, abstraction, performance, and flexibility. Users need a new way of studying and applying sequence comparisons in the post-genomics era. We invented and developed a new bio-specific Database Management System perator, Similar_Join, to abstract the labor-intensive batch sequence similarity search task into a syntactically concise database operation. We implemented the Similar_Join operator as part of a relational operator package. This implementation enabled us to write simple PL/SQL scripts within the DBMS to accomplish routine sequence similarity searches conveniently, for example, a “batch BLAST” that compares 7,000 human genes against 500,000 human Expressed Sequence Tags (EST) in a few hours. We also implemented a simple version of Similar_Join as a database operator in the extended data cartridge of Oracle 8i object-relational DBMS. When fully integrated into SQL language extensions, we demonstrated this opera...
Jake Yue Chen, John V. Carlis