Sciweavers

PKDD
2007
Springer

Finding Transport Proteins in a General Protein Database

14 years 5 months ago
Finding Transport Proteins in a General Protein Database
The number of specialized databases in molecular biology is growing fast, as is the availability of molecular data. These trends necessitate the development of automatic methods for finding relevant information to include in specialized databases. We show how to use a comprehensive database (SwissProt) as a source of new entries for a specialized database (TCDB, the Transport Classification Database). Even carefully constructed keyword-based queries perform poorly in determining which SwissProt records are relevant to TCDB; we show that a machine learning approach performs well. We describe a maximum-entropy classifier, trained on SwissProt records, that achieves high precision and recall in cross-validation experiments. This classifier has been deployed as part of a pipeline for updating TCDB that allows a human expert to examine only about 2% of SwissProt records for potential inclusion in TCDB. The methods we describe are flexible and general, so they can be applied easily to o...
Sanmay Das, Milton H. Saier Jr., Charles Elkan
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where PKDD
Authors Sanmay Das, Milton H. Saier Jr., Charles Elkan
Comments (0)