Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

15 years 6 months ago

Download www.biomedcentral.com

Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolat...

Olivo Miotto, Tin Wee Tan, Vladimir Brusic

Real-time Traffic

BMCBI 2008 | Metadata | Public Databases | Semantic Heterogeneity |

claim paper

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2008
Where	BMCBI
Authors	Olivo Miotto, Tin Wee Tan, Vladimir Brusic

Comments (0)

Sciweavers

Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

BMCBI 2008 | Metadata | Public Databases | Semantic Heterogeneity |

Explore & Download

Productivity Tools

Sciweavers