BLASTP is the most popular tool to perform comparative sequence analysis of protein sequences. An exponential increase in the size of protein sequence databases in recent years, has required either exponentially more runtime or a cluster of machines to keep pace. To address this growing problem, we have designed and built a highly-performant, FPGAaccelerated version of BLASTP, Mercury BLASTP. In this paper we focus on seed generation, the first stage of the BLASTP algorithm. Our seed generator is capable of processing database residues at up to 219 Mresidues/second for 2048-residue queries. When integrated with Mercury BLASTP, we achieve a speedup of 41× over stock NCBI BLASTP, with high sensitivity. Additionally, the architecture can be generalised to accelerate the seed generation stage in other important biocomputing applications.
Arpith C. Jacob, Joseph M. Lancaster, Jeremy Buhle