Abstract. A number of individual bioinformatics applications (particularly BLAST and other sequence searching methods) have recently been implemented over clusters of workstations to take advantage of extra processing power. Performance improvements are achieved for increasingly large sets of input data (sequences and databases), using these implementations. We present an analysis of programs in the EMBOSS suite based on increasing sequence size, and implement these programs in parallel over a cluster of workstations using sequence segmentation with overlap. We observe general increases in runtime for all programs, and examine the speedup for the most intensive ones to establish an optimum segmentation size for those programs across the cluster.
Karl Podesta, Martin Crane, Heather J. Ruskin