In this paper we suggest the requirements for an open platform designed for the description, distribution and analysis of genetic polymorphism data. This platform is discussed in terms of our implementation of a phenotypic prediction pipeline with general application to the understanding of genetic variation. The current state of polymorphism data storage and distribution has several recognised deficiencies. These include the lack of a shared data model and low overlap between databases. To move towards overcoming these limitations we propose a universal data model for polymorphism data called biological variation markup language (BVML). We suggest an aggregation system for pooling resource description framework (RDF) descriptions of polymorphism databases to form distributed federated database indexes, which will facilitate the collaborative involvement of numerous laboratories. An ad hoc query interface for data mining using the extensible markup language (XML) messaging standard si...
Greg D. Tyrelle, Garry C. King