A huge diversity of biological databases is available via the Internet, but many of these databases have been developed in an ad hoc manner rather than in accordance with any data management principles. In addition, in the area of disordered protein databases, many of the databases have not been made publicly available. This poses challenges to researchers, since reliable protein databases are required in order to test and measure the accuracy of protein structure prediction software. In this paper, we describe our work developing a disordered protein database using data from the protein secondary structure database DSSPcont. In particular, we discuss the way in which we have addressed the issues of data cleaning, query processing and interoperability. This research is a pilot study in managing biological data.
Arran D. Stewart, Xiuzhen Zhang