Globally, unrelated protein sequences appear random

15 years 8 months ago

Download people.virginia.edu

Motivation: To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models. Results: While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overr...

Daniel T. Lavelle, William R. Pearson

Real-time Traffic

Acid Word | BIOINFORMATICS 2010 | Proteins | Word Clumps |

claim paper

Post Info
More Details (n/a)

Added	08 Dec 2010
Updated	08 Dec 2010
Type	Journal
Year	2010
Where	BIOINFORMATICS
Authors	Daniel T. Lavelle, William R. Pearson

Comments (0)

Sciweavers

Globally, unrelated protein sequences appear random

Acid Word | BIOINFORMATICS 2010 | Proteins | Word Clumps |

Explore & Download

Productivity Tools

Sciweavers