Sciweavers

CSB
2004
IEEE

AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names

14 years 2 months ago
AZuRE, a Scalable System for Automated Term Disambiguation of Gene and Protein Names
Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.
Raf M. Podowski, John G. Cleary, Nicholas T. Gonch
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2004
Where CSB
Authors Raf M. Podowski, John G. Cleary, Nicholas T. Goncharoff, Gregory Amoutzias, William S. Hayes
Comments (0)