Identification of Chemical Entities in Patent Documents

15 years 8 months ago

Download xldb.fc.ul.pt

Biomedical literature is an important source of information for chemical compounds. However, different representations and nomenclatures for chemical entities exist, which makes the reference of chemical entities ambiguous. Many systems already exist for gene and protein entity recognition, however very few exist for chemical entities. The main reason for this is the lack of corpus to train named entity recognition systems and perform evaluation. In this paper we present a chemical entity recognizer that uses a machine learning approach based on conditional random fields (CRF) and compare the performance with dictionary-based approaches using several terminological resources. For the training and evaluation, a gold standard of manually curated patent documents was used. While the dictionary-based systems perform well in partial identification of chemical entities, the machine learning approach performs better (10% increase in F-score in comparison to the best dictionary-based system) w...

Tiago Grego, Piotr Pezik, Francisco M. Couto, Diet

Real-time Traffic

Artificial Intelligence | Chemical Entities | Entity Recognition | IWANN 2009 | Machine Learning Approach |

claim paper

Post Info
More Details (n/a)

Added	27 May 2010
Updated	27 May 2010
Type	Conference
Year	2009
Where	IWANN
Authors	Tiago Grego, Piotr Pezik, Francisco M. Couto, Dietrich Rebholz-Schuhmann

Comments (0)

Sciweavers

Identification of Chemical Entities in Patent Documents

Artificial Intelligence | Chemical Entities | Entity Recognition | IWANN 2009 | Machine Learning Approach |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers