AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names

14 years 4 months ago

Download conferences.computer.org

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

Raf M. Podowski, John G. Cleary, Nicholas T. Gonch

Real-time Traffic

Bioinformatics | CSB 2004 | Gene | Gene Names | Quality Gene Disambiguation |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2004
Where	CSB
Authors	Raf M. Podowski, John G. Cleary, Nicholas T. Goncharoff, Gregory Amoutzias, William S. Hayes

Comments (0)

Sciweavers

AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names

Bioinformatics | CSB 2004 | Gene | Gene Names | Quality Gene Disambiguation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers