Beginning to create the New Oxford English Dictionary database has resulted in the realization that databases for reference texts are unlike those for conventional enterprises. While the traditional approaches to database design and development are sound, the particular techniques used for commercial databases have been repeatedly found to be inappropriate for text-dominated databases, such as the New OED. In the same way that the relational model was developed based on experiences gained from earlier database approaches, the grammar-based model presented here builds on the traditional foundations of computer science, and particularly database theory and practice. This new model uses grammars as schemas and ``parsed strings'' as instances. Operators on the parsed strings are defined, resulting in a ``p-string algebra'' that can be used for data manipulation and view definition. The model is representation-independent and the operators are non-navigational, so that ...
Gaston H. Gonnet, Frank Wm. Tompa