Systems designed for efficient retrieval of conventional data can be very inefficient at retrieving documents. Documents have more complex structure than conventional data, and the kinds of queries made to document databases are unlike those made to conventional databases. This paper discusses how document storage and retrieval can be effectively supported in a nested relational database system with signature file indexing, and gives a detailed analysis of the space requirements and retrieval times of different document schemas in such a database system.
Justin Zobel, James A. Thom, Ron Sacks-Davis