Identification of related gene/protein names based on an HMM of name variations

15 years 6 months ago

Download www.ncbi.nlm.nih.gov

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.

Lana Yeganova, Lawrence H. Smith, W. John Wilbur

Real-time Traffic

CANDC 2004 | Emerging Technology | Gene | Protein | Protein Names |

claim paper

Added	16 Dec 2010
Updated	16 Dec 2010
Type	Journal
Year	2004
Where	CANDC
Authors	Lana Yeganova, Lawrence H. Smith, W. John Wilbur

Sciweavers

Identification of related gene/protein names based on an HMM of name variations

CANDC 2004 | Emerging Technology | Gene | Protein | Protein Names |

Explore & Download

Productivity Tools

Sciweavers