Sciweavers

DCC
2006
IEEE

On Compressibility of Protein Sequences

14 years 11 months ago
On Compressibility of Protein Sequences
We consider the problem of compressibility of protein sequences. Based on an observed genome-scale long-range correlation in concatenated protein sequences from different organisms, we propose a method to exploit this unusual redundancy in compressing the protein sequences. The result is a significant reduction in the number of bits required for representing the sequences. We report results in bits per symbol (bps) of 2.27, 2.55, 3.11 and 3.44 for protein sequences from M. jannaschii, H. influenzae, S. cerevisiae, and H. sapiens respectively, the same protein sequences used by Nevill-Manning and Witten in the "Protein is incompressible" paper [23]. The observed long-range correlations could have significant implications beyond compression and complexity analysis of protein sequences.
Donald A. Adjeroh, Fei Nan
Added 25 Dec 2009
Updated 25 Dec 2009
Type Conference
Year 2006
Where DCC
Authors Donald A. Adjeroh, Fei Nan
Comments (0)