High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic structure, using a suite of ontologies, which supports management of data and information at each step of the experimental lifecycle. This framework will enable researchers to leverage the large scale of glycoproteomics data to their benefit. In this paper, we focus on the design of these biological ontology schemas with an emphasis on relationships between biological concepts, on the use of novel approaches to populate these complex ontologies including integrating extremely large datasets (~500MB) a...
Satya Sanket Sahoo, Christopher Thomas, Amit P. Sh